PDP: Parallel Dynamic Programming

Fei-Yue Wang; Jie Zhang; Qinglai Wei; Xinhu Zheng; Li Li

Volume 4 Issue 1

Jan. 2017

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2017 > 4(1): 1-5

F.-Y. Wang, J. Zhang, Q. L. Wei, X. H. Zheng, and L. Li, "PDP: Parallel Dynamic Programming," IEEE/CAA J. Autom. Sinica, vol. 4, no. 1, pp. 1-5, . 2017.

Citation:

F.-Y. Wang, J. Zhang, Q. L. Wei, X. H. Zheng, and L. Li, "PDP: Parallel Dynamic Programming," IEEE/CAA J. Autom. Sinica, vol. 4, no. 1, pp. 1-5, . 2017.

F.-Y. Wang, J. Zhang, Q. L. Wei, X. H. Zheng, and L. Li, "PDP: Parallel Dynamic Programming," IEEE/CAA J. Autom. Sinica, vol. 4, no. 1, pp. 1-5, . 2017.

Citation:

F.-Y. Wang, J. Zhang, Q. L. Wei, X. H. Zheng, and L. Li, "PDP: Parallel Dynamic Programming," IEEE/CAA J. Autom. Sinica, vol. 4, no. 1, pp. 1-5, . 2017.

PDF( 3790 KB)

PDP: Parallel Dynamic Programming

Fei-Yue Wang^{1, 2, 3
,
,},
Jie Zhang^{1, 4
,},
Qinglai Wei^{1, 2
,},
Xinhu Zheng^5
,,
Li Li^6
,

1.
The State Key Laboratory of Management and Control for Complex Systems (SKL-MCCS), Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing 100190, China
2.
School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
3.
Research Center for Military Computational Experiments and Parallel Systems Technology, National University of Defense Technology, Changsha 410073, China
4.
Qingdao Academy of Intelligent Industries, Shandong 266000, China
5.
Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55414, USA
6.
Department of Automation, Tsinghua University, Beijing 100084, China

Funds:

This work was supported by National Natural Science Foundation of China 61533019

This work was supported by National Natural Science Foundation of China 61374105

This work was supported by National Natural Science Foundation of China 71232006

This work was supported by National Natural Science Foundation of China 61233001

This work was supported by National Natural Science Foundation of China 71402178

More Information

Author Bio:
Jie Zhang (M'16) is an associate professor with The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences.His current research interests include mechanism design and optimal control in e-commerce and traffic systems.He received his Ph.D.degree in Technology of Computer Application from University of Chinese Academy of Sciences in 2015.He received his BSc.degree in Information and Computing Science from Tsinghua University in 2005, and received MSc.degree in Operations Research and Control Theory from Renmin University of China in 2009.(e-mail:feiyue.wang@ia.ac.cn)

Qinglai Wei (M'11) received Ph.D.degree in control theory and control engineering, from the Northeastern University, Shenyang, China, in 2009.From 2009-2011, he was a postdoctoral fellow with The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China.He is currently a Professor of the institute.He is also a Professor of the University of Chinese Academy of Sciences.He has authored two books, and published over 60 international journal papers.His research interests include adaptive dynamic programming, neural-networks-based control, optimal control, nonlinear systems and their industrial applications. Dr.Wei is an Associate Editor of IEEE Transaction on Systems Man, and Cybernetics:Systems since 2016, Information Sciences since 2016, Neurocomputing since 2016, Optimal Control Applications and Methods since 2016, Acta Automatica Sinica since 2015, and has been holding the same position for IEEE Transactions on Neural Networks and Learning Systems during 2014-2015.He is the Secretary of IEEE Computational Intelligence Society (CIS) Beijing Chapter since 2015.(e-mail:qinglai.wei@ia.ac.cn)

Xinxu Zheng received the B.S.degree in control science and engineering from Zhejiang University, Hangzhou, China, in 2011.He is currently working toward the Ph.D.degree in computer science and engineering at the University of Minnesota, Minneapolis, MN, USA.His research interests include social computing, machine learning, and data analytics.(e-mail:zheng473@umn.edu)

Li Li (S'05-M'06-SM'10-F'17) is currently an associate professor with Department of Automation, Tsinghua University, China.His research interests include complex and networked systems, intelligent control and sensing, intelligent transportation systems and intelligent vehicles.Dr.Li had published over 50 SCI indexed international journal papers and over 50 international conference papers as a first/corresponding author.He serves as an Associate Editor for IEEE Transactions on Intelligent Transportation Systems.(e-mail:li-li@tsinghua.edu.cn)
Corresponding author: Fei-Yue Wang (S'87-M'89-SM'94-F'03) received his Ph.D.in Computer and Systems Engineering from Rensselaer Polytechnic Institute, Troy, New York in 1990.He joined the University of Arizona in 1990 and became a Professor and Director of the Robotics and Automation Lab (RAL) and Program in Advanced Research for Complex Systems (PARCS). In 1999, he founded the Intelligent Control and Systems Engineering Center at the Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China, under the support of the Outstanding Oversea Chinese Talents Program from the State Planning Council and"100 Talent Program"from CAS, and in 2002, was appointed as the Director of the Key Lab of Complex Systems and Intelligence Science, CAS.In 2011, he became the State Specially Appointed Expert and the Director of The State Key Laboratory of Management and Control for Complex Systems.Dr.Wang's current research focuses on methods and applications for parallel systems, social computing, and knowledge automation.He was the Founding Editorin-hief of the International Journal of Intelligent Control and Systems (1995-2000), Founding EiC of IEEE ITS Magazine (2006-2007), EiC of IEEE Intelligent Systems (2009-2012), and EiC of IEEE Transactions on ITS (2009-2016).Currently he is EiC of China's Journal of Command and Control.Since 1997, he has served as General or Program Chair of more than 20 IEEE, INFORMS, ACM, ASME conferences.He was the President of IEEE ITS Society (2005-2007), Chinese Association for Science and Technology (CAST, USA) in 2005, the American Zhu Kezhen Education Foundation (2007-2008), and the Vice President of the ACM China Council (2010-2011).Since 2008, he is the Vice President and Secretary General of Chinese Association of Automation.Dr.Wang is elected Fellow of IEEE, INCOSE, IFAC, ASME, and AAAS.In 2007, he received the 2nd Class National Prize in Natural Sciences of China and awarded the Outstanding Scientist by ACM for his work in intelligent control and social computing.He received IEEE ITS Outstanding Application and Research Awards in 2009 and 2011, and IEEE SMC Norbert Wiener Award in 2014.Corresponding author of this paper.(e-mail:feiyue.wang@ia.ac.cn)
Received Date: 2015-11-11
Accepted Date: 2016-12-21

Abstract

Abstract

Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming (ADP) is first presented instead of direct dynamic programming (DP), and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence.
- Parallel dynamic programming,
- Dynamic programming,
- Adaptive dynamic programming,
- Reinforcement learning,
- Deep learning,
- Neural networks,
- Artificial intelligence

FullText(HTML)

References(46)

References

[1]	D. Silver et al.,"Mastering the game of Go with deep neural networks and tree search,"Nature 529.7587, pp. 484-489, 2016. https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf
[2]	R. E. Bellman, Dynamic Programming. Princeton, NJ:Princeton University Press, 1957.
[3]	P. J. Werbos,"Advanced forecasting methods for global crisis warning and models of intelligence,"General Syst. Yearbook, vol. 22, 1977.
[4]	P. J. Werbos,"A menu of designs for reinforcement learning over time,"in Neural Networks for Control, W. T. Miller, R. S. Sutton and P. J. Werbos (Eds.), Cambridge:MIT Press, 1991, pp. 67-95.
[5]	F.-Y. Wang, et al.,"Where does AlphaGo go:from church-turing thesis to AlphaGo thesis and beyond", IEEE/CAA J. Autom. Sinica, vol. 3, no. 2, pp. 113-120, April 2016. http://blog.sciencenet.cn/home.php?mod=attachment&filename=Where%20Does%20AlphaGo%20Go.pdf&id=85299
[6]	F.-Y. Wang,"A big-data perspective on AI:Newton, Merton, and analytics intelligence", IEEE Intell. Syst., vol. 27, no. 5, pp. 2-4, 2012. doi: 10.1109/MIS.2012.91
[7]	L. Li, Y.-L. Lin, D.-P. Cao, N.-N. Zheng, and F.-Y. Wang,"Parallel learning-a new framework for machine learning,"Acta Autom. Sinica, vol. 43, no. 1, pp. 1-8, 2017(in Chinese).
[8]	J. Li, W. Xu, J. Zhang, M. Zhang, Z. Wang, and X. Li,"Efficient video stitching based on fast structure deformation,"IEEE Trans. Cybern., article in press, 2015. DOI:10.1109/TCYB.2014.2381774.
[9]	C. Vagg, S. Akehurst, C. J. Brace, and L. Ash,"Stochastic dynamic programming in the real-world control of hybrid electric vehicles,"IEEE Trans. Control Syst. Technol., vol. 24, no. 3, pp. 853-866, Mar. 2016. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7337438
[10]	P. M. Esfahani, D. Chatterjee, and J. Lygeros,"Motion planning for continuous-time stochastic processes:A dynamic programming approach,"IEEE Trans. Autom. Control, vol. 61, pp. 2155-2170, 2016. https://www.researchgate.net/publication/283790302_Motion_Planning_for_Continuous_Time_Stochastic_Processes_A_Dynamic_Programming_Approach
[11]	P. J. Werbos,"Approximate dynamic programming for real-time control and neural modeling,"in Handbook of Intelligent Control:Neural, Fuzzy, and Adaptive Approaches, D.A. White and D.A. Sofge (Eds.), New York:Van Nostrand Reinhold, 1992, ch. 13. http://citeseerx.ist.psu.edu/showciting?cid=258656
[12]	D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA:Athena Scientific, 1996.
[13]	D. V. Prokhorov and D. C. Wunsch,"Adaptive critic designs,"IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997. http://dl.acm.org/citation.cfm?id=2326139
[14]	J. Han, S. Khushalani-Solanki, J. Solanki, and J. Liang,"Adaptive critic design-based dynamic stochastic optimal control design for a microgrid with multiple renewable resources,"IEEE Trans. Smart Grid, vol. 6, no. 6, pp. 2694-2703, Jun. 2015. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7175036
[15]	R. S. Sutton and A. G. Barto, Reinforcement Learning:An Introduction. Cambridge, MA:MIT Press, 1998. https://www.amazon.com/Reinforcement-Learning-Introduction-Adaptive-Computation/dp/0262193981
[16]	J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks,"Adaptive dynamic programming,"IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 2, pp. 140-153, May 2002.
[17]	Q. Wei, F. L. Lewis, D. Liu, R. Song, and H. Lin,"Discrete-time local value Iteration adaptive dynamic programming:Convergence analysis,"IEEE Trans. Syst., Man, Cybern. A, Syst., article in press, 2016. DOI:10.1109/TSMC.2016.2623766.
[18]	Q. Wei, F. L. Lewis, Q. Sun, P. Yan, and R. Song,"Discrete-time deterministic Q-learning:A novel convergence analysis,"IEEE Trans. Cybern., article in press, 2016. DOI:10.1109/TCYB.2016.2542923.
[19]	Q. Wei, D. Liu, and G. Shi,"A novel dual iterative Q-learning method for optimal battery management in smart residential environments,"IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2509-2518, Apr. 2015. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6915886
[20]	Q. Wei and D. Liu,"A novel iterative-Adaptive dynamic programming for discrete-time nonlinear systems,"IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1176-1190, Oct. 2014. http://ieeexplore.ieee.org/document/6609148/
[21]	Q. Wei, D. Liu, Q. Lin, and R. Song,"Discrete-time optimal control via local policy iteration adaptive dynamic programming,"IEEE Trans. Cybern., article in press, 2016. DOI:10.1109/TCYB.2016.2586082.
[22]	R. Enns and J. Si,"Helicopter trimming and tracking control using direct neural dynamic programming,"IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 929-939, Aug. 2003. http://ieeexplore.ieee.org/document/1215408/
[23]	R. Kamalapurkar, J. R. Klotz, and W. E. Dixon,"Concurrent learningbased approximate feedback-Nash equilibrium solution of N-player nonzero-sum differential games,"IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 239-247, Jul. 2014. http://www.ieee-jas.org/CN/abstract/abstract97.shtml
[24]	Q. Wei, D. Liu, and Q. Lin,"Discrete-time local iterative adaptive dynamic programming:Terminations and admissibility analysis,"IEEE Trans. Neural Netw. Learn. Syst., article in press, 2016. DOI:10.1109/TNNLS.2016.2593743.
[25]	Q. Wei, R. Song, and P. Yan,"Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP,"IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 2, pp. 444-458, Feb. 2016. http://ieeexplore.ieee.org/document/7208854/
[26]	H. Zhang, C. Qin, B. Jiang, and Y. Luo,"Online adaptive policy learning algorithm for H_∞ state feedback control of unknown affine nonlinear discrete-time systems,"IEEE Trans. Cybern., vol. 44, no. 12, pp. 2706-2718, Dec. 2014. https://www.ncbi.nlm.nih.gov/pubmed/25095274
[27]	F.-Y. Wang and G. N. Saridis,"Suboptimal control for nonlinear stochastic systems,"Proc. 31st IEEE Conf. Decision Control, 1992. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=371109
[28]	G. N. Saridis and F.-Y. Wang,"Suboptimal control of nonlinear stochastic systems,"Control Theory and Advanced Technology, vol. 10, no. 4, pp. 847-871, 1994. https://www.researchgate.net/publication/224669527_Suboptimal_control_of_nonlinear_stochastic_systems
[29]	Q. Wei, D. Liu, and X. Yang,"Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems,"IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 866-879, Apr. 2015. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7052401&filter%3DAND%28p_IS_Number%3A7061550%29
[30]	Q. Wei, D. Liu, Y. Liu, and R. Song,"Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming,"IEEE/CAA J. Autom. Sinica, article in press, 2016. DOI:10.1109/JAS.2016.7510262.
[31]	Q. Zhao, H. Xu, and S. Jagannathan,"Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning,"IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp. 372-384, Oct. 2014. http://ieeexplore.ieee.org/document/4370989/
[32]	Q. Wei, D. Liu, G. Shi, and Y. Liu,"Optimal multi-battery coordination control for home energy management systems via distributed iterative adaptive dynamic programming,"IEEE Trans. Ind. Electron., vol. 42, no. 7, pp. 4203-4214, Jul. 2015. https://www.researchgate.net/publication/273176842_Multi-Battery_Optimal_Coordination_Control_for_Home_Energy_Management_Systems_via_Distributed_Iterative_Adaptive_Dynamic_Programming?_sg=3y92bCwZfeymLHbpkNepKHvyJPXT_5p7IsK3eaW3YT6oX0AIaWQzP-HrmgPuGTz7HwXPz-CDc2k4U4QJ-vTZrw
[33]	Q. Wei, D. Liu, and H. Lin,"Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems,"IEEE Trans. Cybern., vol. 46, no. 3, pp. 840-853, Mar. 2016. http://ieeexplore.ieee.org/document/7314890/
[34]	Q. Wei, F. Wang, D. Liu, and X. Yang,"Finite-approximation-error based discrete-time iterative adaptive dynamic programming,"IEEE Trans. Cybern., vol. 44, no. 12, pp. 2820-2833, Dec. 2014. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6912005
[35]	H. Li and D. Liu,"Optimal control for discrete-time affine non-linear systems using general value iteration,"IET Control Theory Appl., vol. 6, no. 18, pp. 2725-2736, Dec. 2012. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6418261
[36]	W. Gao and Z.-P. Jiang,"Adaptive dynamic programming and adaptive optimal output regulation of linear systems,"IEEE Trans. Autom. Control, vol. 61, no. 12, pp. 4164-4169, Dec. 2016. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7444144
[37]	Y. Duan, Y. Lv, J. Zhang, X. Zhao, and F.-Y. Wang,"Deep learning for control:The state of the art and prospects,"Acta Autom. Sinica, vol 42, no. 5, pp. 643-654, 2016. https://www.researchgate.net/publication/304888213_Deep_learning_for_control_the_state_of_the_art_and_prospects?_sg=nKxXcMNTesIgnrsxBDKmYme9XgVbVByLEqRJ5jzu_sA7M2xrAYZ40PSPmQ_DCA8aeb2SkTwtve26ulEHvKlAaQ
[38]	F.-Y. Wang,"Building knowledge structure in neural nets using fuzzy logic,"Robotics and Manufacturing:Recent Trends in Research Education and Applications, M. Jamshidi (Eds.), New York, NY, ASME (American Society of Mechanical Engineers) Press, 1992.
[39]	F.-Y. Wang and H.-A. Kim,"Implementing adaptive fuzzy logic controllers with neural networks:a design paradigm,"J. Intell. Fuzzy Syst., vol. 3, no. 2, pp. 165-180, 1995. https://www.researchgate.net/publication/305161757_Implementing_adaptive_fuzzy_logic_controllers_with_neural_networks_A_design_paradigm?_sg=JppkPZebku65ugc2wT3J8qk6iDZ_ugv1IatEl7w9LTcd661RChmgoIk0hB4H1gAF_8PUr1AdDtOadBj6hI9SrQ
[40]	F.-Y. Wang,"The emergence of intelligent enterprises:From CPS to CPSS,"IEEE Intell. Syst., vol. 25, no. 4, pp. 85-88, 2010. doi: 10.1109/MIS.2010.104
[41]	C. Nyce,"Predictive analytics white paper,"American Institute for Chartered Property Casualty Underwriters/Insurance Institute of America, 2007.
[42]	W. Eckerson,"Extending the value of your data warehousing investment,"The Data Warehouse Institute, USA, 2007.
[43]	J. R. Evans and C. H. Lindner,"Business analytics:The next frontier for decision sciences,"Decision Line, vol. 43, no. 2, pp. 1-4, Mar. 2012.
[44]	J. Zhang, Q. Wei, and F.-Y. Wang,"Parallel dynammic programming with an average-greedy mechanism for discrete systems,"SKLMCCS/QAⅡ Tech Report 01-09-2016, ASIA, Beijing, China.
[45]	F.-Y. Wang,"Parallel control:a method for data-driven and computational control,"Acta Autom.a Sinica, vol.39, no. 2, pp. 293-302, 2013. http://www.aas.net.cn/EN/abstract/abstract17915.shtml
[46]	F.-Y. Wang,"Control 5.0:From Newton to Merton in Popper's Cyber-Social-Physical Spaces,"IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 233-234, 2016. doi: 10.1109/JAS.2016.7508796

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5)

Get Citation

PDF

XML

Article Metrics

Article views (1244) PDF downloads(42)

PDP: Parallel Dynamic Programming

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content