IEEE/CAA Journal of Automatica Sinica
Citation: | C. H. Liu, F. Zhu, Q. Liu, and Y. C. Fu, "Hierarchical Reinforcement Learning With Automatic Sub-Goal Identification," IEEE/CAA J. Autom. Sinica, vol. 8, no. 10, pp. 1686-1696, Oct. 2021. doi: 10.1109/JAS.2021.1004141 |
[1] |
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. MIT press, 2018.
|
[2] |
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. doi: 10.1038/nature14236
|
[3] |
H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double q-learning,” in Proc. 30th AAAI Conf. Artificial Intelligence, 2016, pp. 2094–2100.
|
[4] |
T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in Proc. Advances in Int. Conf. Learning Representations, 2016, pp. 1–21.
|
[5] |
Z. Wang, T. Schaul, M. Hessel, H. Hasselt, M. Lanctot, and N. Freitas, “Dueling network architectures for deep reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2016, pp. 1995–2003.
|
[6] |
R. S. Sutton, D. Precup, and S. Singh, “Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning,” Artificial Intelligence, vol. 112, no. 1–2, pp. 181–211, 1999. doi: 10.1016/S0004-3702(99)00052-1
|
[7] |
T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,” Advances in Neural Information Processing Systems, 2016, pp. 3675–3683.
|
[8] |
M. Andrychowicz, F. Wolski, A. Ray, J. Schneider, R. Fong, P. Welinder, B. McGrew, J. Tobin, O. P. Abbeel, and W. Zaremba, “Hindsight experience replay, ” in Advances in Neural Information Processing Systems, 2017, pp. 5048–5058.
|
[9] |
C. Florensa, Y. Duan, and P. Abbeel, “Stochastic neural networks for hierarchical reinforcement learning,” in Proc. Advances in Int. Conf. Learning Representations, 2017, pp. 1–17.
|
[10] |
H. Le, N. Jiang, A. Agarwal, M. Dudik, Y. Yue, and H. Daumé, “Hierarchical imitation and reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 2923–2932.
|
[11] |
X. B. Peng, G. Berseth, K. Yin, and M. Van De Panne, “Deeploco: Dynamic locomotion skills using hierarchical deep reinforcement learning,” ACM Trans. Graphics, vol. 36, no. 4, pp. 1–13, 2017.
|
[12] |
J. Rafati and D. C. Noelle, “Learning representations in model-free hierarchical reinforcement learning,” in Proc. AAAI Conf. Artificial Intelligence, 2019, pp. 10009–10010.
|
[13] |
M. Imani and U. M. Braga-Neto, “Control of gene regulatory networks using bayesian inverse reinforcement learning,” IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 16, no. 4, pp. 1250–1261, 2019. doi: 10.1109/TCBB.2018.2830357
|
[14] |
N. Dilokthanakul, C. Kaplanis, N. Pawlowski, and M. Shanahan, “Feature control as intrinsic motivation for hierarchical reinforcement learning,” IEEE Trans. Neural Networks &Learning Systems, vol. 30, no. 11, pp. 3409–3418, 2019.
|
[15] |
H. Van Seijen, M. Fatemi, J. Romoff, R. Laroche, T. Barnes, and J. Tsang, “Hybrid reward architecture for reinforcement learning,” Advances in Neural Information Processing Systems, 2017, pp. 5392– 5402.
|
[16] |
J. Yan, H. He, X. Zhong, and Y. Tang, “Q-learning-based vulnerability analysis of smart grid against sequential topology attacks,” IEEE Trans. Information Forensics &Security, vol. 12, no. 1, pp. 200–210, 2017.
|
[17] |
H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning,” IEEE Trans. Medical Imaging, vol. 35, no. 5, pp. 1285–1298, 2016. doi: 10.1109/TMI.2016.2528162
|
[18] |
B. Hengst, “Hierarchical reinforcement learning,” Encyclopedia of Machine Learning and Data Mining, pp. 611–619, 2017.
|
[19] |
R. E. Parr and S. Russell, Hierarchical Control and Learning for Markov Decision Processes. University of California, Berkeley Berkeley, CA, 1998.
|
[20] |
R. Ramesh, M. Tomar, and B. Ravindran, “Successor options: An option discovery framework for reinforcement learning,” in Proc. 28th Int. Joint Conf. Artificial Intelligence, 2019, pp. 3304–3310.
|
[21] |
T. G. Dietterich, “Hierarchical reinforcement learning with the MAXQ value function decomposition,” Journal of Artificial Intelligence Research, vol. 13, pp. 227–303, 2000. doi: 10.1613/jair.639
|
[22] |
P. Kai, A. Escande, and A. Kheddar, “Singularity resolution in equality and inequality constrained hierarchical task-space control by adaptive non-linear least-squares,” IEEE Robotics &Automation Letters, vol. 3, no. 4, pp. 3630–3637, 2018.
|
[23] |
D. Abel, D. Arumugam, L. Lehnert, and M. Littman, “State abstractions for lifelong reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 10–19.
|
[24] |
Y. Fu, Z. Xu, F. Zhu, Q. Liu, and X. Zhou, “Learn to human-level control in dynamic environment using incremental batch interrupting temporal abstraction,” Computer Science &Information Systems, vol. 13, no. 2, pp. 561–577, 2016.
|
[25] |
A. Neitz, G. Parascandolo, S. Bauer, and B. Schölkopf, “Adaptive skip intervals: Temporal abstraction for recurrent dynamical models,” Advances in Neural Information Processing Systems, 2018, pp. 9816– 9826.
|
[26] |
O. Nachum, S. S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” Advances in Neural Information Processing Systems, 2018, pp. 3303–3313.
|
[27] |
J. Andreas, D. Klein, and S. Levine, “Modular multitask reinforcement learning with policy sketches,” in Proc. 34th Int. Conf. Machine Learning-Volume 70, 2017, pp. 166–175.
|
[28] |
I. Clavera, J. Rothfuss, J. Schulman, Y. Fujita, T. Asfour, and P. Abbeel, “Model-based reinforcement learning via meta-policy optimization,” in Proc. Conf. Robot Learning, 2018, pp. 617–629.
|
[29] |
C. H. Sudre, W. Li, T. Vercauteren, S. Ourselin, and M. J. Cardoso, “Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, 2017, pp. 240–248.
|
[30] |
Z. Xu, H. P. van Hasselt, and D. Silver, “Meta-gradient reinforcement learning,” Advances in Neural Information Processing Systems, 2018, pp. 2396–2407.
|
[31] |
A. Garivier, P. Ménard, and G. Stoltz, “Explore first, exploit next: The true shape of regret in bandit problems,” Mathematics of Operations Research, vol. 44, no. 2, pp. 377–399, 2018.
|
[32] |
M. P. Saka, O. Hasancebi, and Z. W. Geem, “Metaheuristics in structural optimization and discussions on harmony search algorithm,” Swarm and Evolutionary Computation, vol. 28, pp. 88–97, 2016. doi: 10.1016/j.swevo.2016.01.005
|
[33] |
N. Heess, G. Wayne, D. Silver, T. Lillicrap, T. Erez, and Y. Tassa, “Learning continuous control policies by stochastic value gradients,” Advances in Neural Information Processing Systems, 2015, pp. 2944–2952.
|
[34] |
J. P. O’Doherty, S. W. Lee, and D. McNamee, “The structure of reinforcement-learning mechanisms in the human brain,” Current Opinion in Behavioral Sciences, vol. 1, pp. 94–100, 2015. doi: 10.1016/j.cobeha.2014.10.004
|
[35] |
A. G. Barto, “Intrinsic motivation and reinforcement learning,” in Intrinsically Motivated Learning in Natural and Artificial Systems. Springer, 2013, pp. 17–47.
|
[36] |
P.-L. Bacon, J. Harb, and D. Precup, “The option-critic architecture,” in Proc. 31st AAAI Conf. Artificial Intelligence, 2017, pp. 1726– 1734.
|
[37] |
Z. Zhao, Z. Yan, F. Li, M. Zhao, Z. Li, and S. Yan, “Discriminative sparse flexible manifold embedding with novel graph for robust visual representation and label propagation,” Pattern Recognition, vol. 61, pp. 492–510, 2017. doi: 10.1016/j.patcog.2016.07.042
|
[38] |
C. Wong, N. Houlsby, Y. Lu, and A. Gesmundo, “Transfer learning with neural automl,” Advances in Neural Information Processing Systems, 2018, pp. 8356–8365.
|
[39] |
G. D. Ruxton, “The unequal variance t-test is an underused alternative to student’s t-test and the Mann-Whitney U test,” Behavioral Ecology, vol. 17, no. 4, pp. 688–690, 2006. doi: 10.1093/beheco/ark016
|
[40] |
J. C. F. De Winter, “Using the student’s t-test with extremely small sample sizes,” Practical Assessment Research &Evaluation, vol. 18, no. 10, pp. 1–12, 2013.
|