Parallel Reinforcement Learning: A Framework and Case Study

Teng Liu; Bin Tian; Yunfeng Ai; Li Li; Dongpu Cao; Fei-Yue Wang

doi:10.1109/JAS.2018.7511144

Volume 5 Issue 4

Jul. 2018

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2018 > 5(4): 827-835

Teng Liu, Bin Tian, Yunfeng Ai, Li Li, Dongpu Cao and Fei-Yue Wang, "Parallel Reinforcement Learning: A Framework and Case Study," IEEE/CAA J. Autom. Sinica, vol. 5, no. 4, pp. 827-835, July 2018. doi: 10.1109/JAS.2018.7511144

Citation:

Teng Liu, Bin Tian, Yunfeng Ai, Li Li, Dongpu Cao and Fei-Yue Wang, "Parallel Reinforcement Learning: A Framework and Case Study," IEEE/CAA J. Autom. Sinica, vol. 5, no. 4, pp. 827-835, July 2018. doi: 10.1109/JAS.2018.7511144

Citation:

PDF( 7005 KB)

Parallel Reinforcement Learning: A Framework and Case Study

doi: 10.1109/JAS.2018.7511144

Teng Liu^1
,,
Bin Tian^{2, 3
,},
Yunfeng Ai^4
,,
Li Li^{5
,
,},
Dongpu Cao^6
,,
Fei-Yue Wang^1
,

1.
State Key Laboratory of Management and Control for Complex System, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
2.
State Key Lab. of Management and Control for Complex System, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
3.
Cloud Computing Center, Chinese Academy of Sciences, Dongguan 523808, China
4.
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China
5.
Department of Automation, Tsinghua University, Beijing 100190, China
6.
Driver Cognition and Automated Driving Laboratory, University of Waterloo, Waterloo N2L 3G1, Canada

Funds:

the National Natural Science Foundation of China 61503380

the Natural Science Foundation of Guangdong Province, China 2015A030310187

More Information

Author Bio:
Teng Liu (M'18) received the B.S. degree in mathematics from Beijing Institute of Technology, Beijing, China, 2011. He received his Ph.D. degree in the vehicle engineering from Beijing Institute of Technology, in 2017. His Ph.D. dissertation, under the supervision of Dr. Fengchun Sun, was entitled "Reinforcement learning-based energy management for hybrid electric vehicles." Dr. Liu is now a postdoctoral research fellow at the Institute of Automation, Chinese Academy of Sciences, China. Dr. Liu has more than 6 years' research and working experience in new energy vehicle and control. His current research focuses on parallel driving, parallel reinforcement learning, automated driving, and energy management of electrified vehicles. He has published over 20 papers in these areas. (e-mail: tengliu17@gmail.com)

Bin Tian received the B.S. degree from Shandong University, in 2009 and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, China, in 2014. He is currently an Associate Professor of the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences. His current research interests include computer vision, machine learning, and automated driving. (e-mail: bin.tian@ia.ac.cn)

Yunfeng Ai received the B.S. degree from Shandong University, in 2001 and the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences, China, in 2006. He was a visiting scholar from December 2015 to October 2016 and postdoctoral researcher from November 2016 to April 2017 at Carnegie Mellon University, USA. He is currently a research scientist of University of Chinese Academy of Sciences. His current research interests include computer vision, machine learning, parallel robots, and automated driving. (e-mail: aiyunfeng@ucas.ac.cn)

Li Li (S'05-M'06-SM'10-F'17) is currently an Associate Professor with the Department of Automation, Tsinghua University, China, working in the fields of complex and networked systems, intelligent control and sensing, intelligent transportation systems, and intelligent vehicles. Dr. Li had published more than 50 SCI indexed international journal papers and more than 50 international conference papers as a first/corresponding author. He serves as an Associate Editor of the IEEE Transactions on Intelligent Transportation Systems. (email: li-li@tsinghua.edu.cn)

Dongpu Cao (M'08) received the Ph.D. degree from Concordia University, Canada, in 2008. He is currently an Assistant Professor at the Driver Cognition and Automated Driving Laboratory, University of Waterloo, Canada. His research focuses on vehicle dynamics, control and intelligence, where he has contributed more than 120 publications and 1 US patent. He received the ASME AVTT'2010 Best Paper Award and 2012 SAE Arch T. Colwell Merit Award. Dr. Cao serves as an Associate Editor for IEEE Transactions on Intelligent Transportation Systems, IEEE Transactions on Vehicular Technology and IEEE Transactions on Industrial Electronics. He has been a Guest Editor for Vehicle System Dynamics, IEEE/ASME Transactions on Mechatronics and IEEE Transactions on Human-Machine Systems. He serves on the SAE International Vehicle Dynamics Standards Committee and a few ASME, SAE, IEEE technical committees. (e-mail: dongp_ca@yahoo.om)

Fei-Yue Wang (S'87-M'89-SM'94-F'03) received his Ph.D. in computer and systems engineering from Rensselaer Polytechnic Institute, Troy, New York in 1990. He joined the University of Arizona in 1990 and became a Professor and Director of the Robotics and Automation Lab (RAL) and Program in Advanced Research for Complex Systems (PARCS). In 1999, he founded the Intelligent Control and Systems Engineering Center at the Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China, under the support of the Outstanding Overseas Chinese Talents Program from the State Planning Council and "100 Talent Program" from CAS, and in 2002, was appointed as the Director of the Key Lab of Complex Systems and Intelligence Science, CAS. From 2006 to 2010, he was Vice President for Research, Education, and Academic Exchanges at the Institute of Automation, CAS. In 2011, he became the State Specially Appointed Expert and the Director of the State Key Laboratory of Management and Control for Complex Systems. Dr. Wang's research focuses on methods and applications for parallel systems, social computing, and knowledge automation. He was the Founding Editor-in-Chief of the International Journal of Intelligent Control and Systems (1995-2000), Founding EiC of IEEE ITS Magazine (2006-2007), EiC of IEEE Intelligent Systems (2009-2012), and EiC of IEEE Transactions on ITS (2009- 2016). Currently he is EiC of IEEE Transactions on Computational Social Systems, Founding EiC of IEEE/CAA Journal of Automatica Sinica, and Chinese Journal of Command and Control. Since 1997, he has served as General or Program Chair of more than 20 IEEE, INFORMS, ACM, and ASME conferences. He was the President of IEEE ITS Society (2005-2007), Chinese Association for Science and Technology (CAST, USA) in 2005, the American Zhu Kezhen Education Foundation (2007-2008), and the Vice President of the ACM China Council (2010-2011). Since 2008, he has been the Vice President and Secretary General of Chinese Association of Automation.
Dr. Wang has been elected as Fellow of IEEE, INCOSE, IFAC, ASME, and AAAS. In 2007, he received the National Prize in Natural Sciences of China and was awarded the Outstanding Scientist by ACM for his research contributions in intelligent control and social computing. He received IEEE ITS Outstanding Application and Research Awards in 2009, 2011 and 2015, and IEEE SMC Norbert Wiener Award in 2014. (e-mail: feiyue@ieee.org)
Corresponding author: Li Li, email: li-li@tsinghua.edu.cn
Received Date: 2017-09-26
Accepted Date: 2018-02-27

Abstract

Abstract

In this paper, a new machine learning framework is developed for complex system control, called parallel reinforcement learning. To overcome data deficiency of current data-driven algorithms, a parallel system is built to improve complex learning system by self-guidance. Based on the Markov chain (MC) theory, we combine the transfer learning, predictive learning, deep learning and reinforcement learning to tackle the data and action processes and to express the knowledge. Parallel reinforcement learning framework is formulated and several case studies for real-world problems are finally introduced.
- Deep learning,
- machine learning,
- parallel reinforcement learning,
- parallel system,
- predictive learning,
- transfer learning

FullText(HTML)

References(30)

References

[1]	V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning, " Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015. http://europepmc.org/abstract/med/25719670
[2]	D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, "Mastering the game of Go with deep neural networks and tree search, " Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016. http://www.ncbi.nlm.nih.gov/pubmed/26819042
[3]	Y. K. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, F. -F. Li, and A. Farhadi, "Target-driven visual navigation in indoor scenes using deep reinforcement learning, " in Proc. 2017 IEEE Int. Conf. Robotics and Automation (ICRA), Singapore, pp. 3357-3364. http://arxiv.org/abs/1609.05143
[4]	I. Popov, N. Heess, T. Lillicrap, R. Hafner, G. Barth-Maron, M. Vecerik, T. Lampe, Y. Tassa, T. Erez, and M. Riedmiller, "Data-efficient deep reinforcement learning for dexterous manipulation, " arXiv: 1704.03073, 2017. http://arxiv.org/abs/1704.03073
[5]	X. W. Qi, Y. D. Luo, G. Y. Wu, K. Boriboonsomsin, and M. J. Barth, "Deep reinforcement learning-based vehicle energy efficiency autonomous learning system, " in Proc. Intelligent Vehicles Symp. (Ⅳ), Los Angeles, CA, USA, pp. 1228-1233, 2017. http://www.researchgate.net/publication/318800742_Deep_reinforcement_learning-based_vehicle_energy_efficiency_autonomous_learning_system
[6]	J. C. Caicedo and S. Lazebnik, "Active object localization with deep reinforcement learning, " in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 2488-2496. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7410643
[7]	X. X. Guo, S. Singh, R. Lewis, and H. Lee, "Deep learning for reward design to improve Monte Carlo tree search in Atari games, " arXiv: 1604.07095, 2016. http://arxiv.org/abs/1604.07095
[8]	V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with deep reinforcement learning, " arXiv: 1312.5602, 2013.
[9]	J. Heinrich and D. Silver, "Deep reinforcement learning from self-play in imperfect-information games, " arXiv: 1603.01121, 2016. http://arxiv.org/abs/1603.01121
[10]	D. Hafner, "Deep reinforcement learning from raw pixels in doom, " arXiv: 1610.02164, 2016. http://arxiv.org/abs/1610.02164
[11]	K. Narasimhan, T. Kulkarni, and R. Barzilay, "Language understanding for text-based games using deep reinforcement learning, " arXiv: 1506.08941, 2015. http://arxiv.org/abs/1506.08941
[12]	L. Li, Y. L. Lin, N. N. Zheng, and F. Y. Wang, "Parallel learning: a perspective and a framework, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 3, pp. 389-395, Jul. 2017. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7974888
[13]	F. Y. Wang, "Artificial societies, computational experiments, and parallel systems: a discussion on computational theory of complex social-economic systems, " Complex Syst. Complex. Sci., vol. 1, no. 4, pp. 25-35, Oct. 2004. http://en.cnki.com.cn/Article_en/CJFDTOTAL-FZXT200404001.htm
[14]	F. Y. Wang, "Toward a paradigm shift in social computing: the ACP approach, " IEEE Intell. Syst., vol. 22, no. 5, pp. 65-67, Sep. -Oct. 2007. http://ieeexplore.ieee.org/document/4338496/
[15]	F. Y. Wang, "Parallel control and management for intelligent transportation systems: concepts, architectures, and applications, " IEEE Trans. Intell. Transp. Syst., vol. 11, no. 3, pp. 630-638, Sep. 2010. http://ieeexplore.ieee.org/document/5549912/
[16]	F. Y. Wang and S. N. Tang, "Artificial societies for integrated and sustainable development of metropolitan systems, " IEEE Intell. Syst., vol. 19, no. 4, pp. 82-87, Jul. -Aug. 2004. http://ieeexplore.ieee.org/abstract/document/1333039/
[17]	F. Y. Wang, H. G. Zhang, and D. R. Liu, "Adaptive dynamic programming: an introduction, " IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=4840325
[18]	F. Y. Wang, "The emergence of intelligent enterprises: From CPS to CPSS, " IEEE Intell. Syst., vol. 25, no. 4, pp. 85-88, Jul. -Aug. 2010. http://ieeexplore.ieee.org/document/5552591/
[19]	F. Y. Wang, N. N. Zheng, D. P. Cao, C. M. Martinez, L. Li, and T. Liu, "Parallel driving in CPSS: a unified approach for transport automation and vehicle intelligence, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 4, pp. 577-587, Oct. 2017. http://ieeexplore.ieee.org/document/8039015/
[20]	K. F. Wang, C. Gou, and F. Y. Wang, "Parallel vision: an ACP-based approach to intelligent vision computing, " Acta Automat. Sin., vol. 42, no. 10, pp. 1490-1500, Oct. 2016. http://www.aas.net.cn/EN/Y2016/V42/I10/1490
[21]	P. Nyberg, E. Frisk, and L. Nielsen, "Driving cycle equivalence and transformation, " IEEE Trans. Veh. Technol., vol. 66, no. 3, pp. 1963-1974, Mar. 2017. http://ieeexplore.ieee.org/document/7493605/
[22]	P. Nyberg, E. Frisk, and L. Nielsen, "Driving cycle adaption and design based on mean tractive force, " in Proc. 7th IFAC Symp. Advanced Automatic Control, Tokyo, Japan, vol. 7, no. 1, pp. 689-694, 2013. http://www.researchgate.net/publication/271479464_Driving_Cycle_Adaption_and_Design_Based_on_Mean_Tractive_Force?ev=auth_pub
[23]	D. P. Filev and I. Kolmanovsky, "Generalized markov models for real-time modeling of continuous systems, " IEEE Trans. Fuzzy Syst., vol. 22, no. 4, pp. 983-998, Aug. 2014. http://ieeexplore.ieee.org/document/6588289/
[24]	D. P. Filev and I. Kolmanovsky, "Markov chain modeling approaches for on board applications, " in Proc. 2010 American Control Conf., Baltimore, MD, USA, pp. 4139-4145. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5530610
[25]	T. Liu, X. S. Hu, S. E. Li, and D. P. Cao, "Reinforcement learning optimized look-ahead energy management of a parallel hybrid electric vehicle, " IEEE/ASME Trans. Mechatron., vol. 22, no. 4, pp. 1497-1507, Aug. 2017. http://ieeexplore.ieee.org/document/7932983/
[26]	A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures, " Neural Netw., vol. 18, no. 5-6, pp. 602-610, Jul. -Aug. 2005. http://www.ncbi.nlm.nih.gov/pubmed/16112549
[27]	Y. S. Lv, Y. J. Duan, W. W. Kang, Z. X. Li, and F. Y. Wang, "Traffic flow prediction with big data: a deep learning approach, " IEEE Trans. Intell. Transp. Syst., vol. 16, no. 2, pp. 865-873, Apr. 2015. http://ieeexplore.ieee.org/document/6894591/
[28]	J. P. Zhang, F. Y. Wang, K. F. Wang, W. H. Lin, X. Xu, and C. Chen, "Data-driven intelligent transportation systems: a survey, " IEEE Trans. Intell. Transp. Syst., vol. 12, no. 4, pp. 1624-1639, Dec. 2011. http://ieeexplore.ieee.org/document/5959985/
[29]	K. F. Wang, C. Gou, N. N. Zheng, J. M. Rehg, and F. Y. Wang, "Parallel vision for perception and understanding of complex scenes: methods, framework, and perspectives, " Artif. Intell. Rev., vol. 48, no. 3, pp. 299-329, Oct. 2017. doi: 10.1007%2Fs10462-017-9569-z
[30]	W. Liu, Z. H. Li, L. Li, and F. Y. Wang, "Parking like a human: A direct trajectory planning solution, " IEEE Trans. Intell. Transp. Syst., vol. 18, no. 12, pp. 3388-3397, Dec. 2017. http://ieeexplore.ieee.org/document/7902173/

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(7)

Get Citation

PDF

XML

Article Metrics

Article views (1670) PDF downloads(112)

Parallel Reinforcement Learning: A Framework and Case Study

doi: 10.1109/JAS.2018.7511144

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content