IEEE/CAA Journal of Automatica Sinica
Citation: | F.-Y. Wang, J. Zhang, Q. L. Wei, X. H. Zheng, and L. Li, "PDP: Parallel Dynamic Programming," IEEE/CAA J. Autom. Sinica, vol. 4, no. 1, pp. 1-5, . 2017. |
Deep reinforcement learning is a focus research area in artificial intelligence. The principle of optimality in dynamic programming is a key to the success of reinforcement learning methods. The principle of adaptive dynamic programming (ADP) is first presented instead of direct dynamic programming (DP), and the inherent relationship between ADP and deep reinforcement learning is developed. Next, analytics intelligence, as the necessary requirement, for the real reinforcement learning, is discussed. Finally, the principle of the parallel dynamic programming, which integrates dynamic programming and analytics intelligence, is presented as the future computational intelligence.
[1] |
D. Silver et al.,"Mastering the game of Go with deep neural networks and tree search,"Nature 529.7587, pp. 484-489, 2016. https://gogameguru.com/i/2016/03/deepmind-mastering-go.pdf
|
[2] |
R. E. Bellman, Dynamic Programming. Princeton, NJ:Princeton University Press, 1957.
|
[3] |
P. J. Werbos,"Advanced forecasting methods for global crisis warning and models of intelligence,"General Syst. Yearbook, vol. 22, 1977.
|
[4] |
P. J. Werbos,"A menu of designs for reinforcement learning over time,"in Neural Networks for Control, W. T. Miller, R. S. Sutton and P. J. Werbos (Eds.), Cambridge:MIT Press, 1991, pp. 67-95.
|
[5] |
F.-Y. Wang, et al.,"Where does AlphaGo go:from church-turing thesis to AlphaGo thesis and beyond", IEEE/CAA J. Autom. Sinica, vol. 3, no. 2, pp. 113-120, April 2016. http://blog.sciencenet.cn/home.php?mod=attachment&filename=Where%20Does%20AlphaGo%20Go.pdf&id=85299
|
[6] |
F.-Y. Wang,"A big-data perspective on AI:Newton, Merton, and analytics intelligence", IEEE Intell. Syst., vol. 27, no. 5, pp. 2-4, 2012. doi: 10.1109/MIS.2012.91
|
[7] |
L. Li, Y.-L. Lin, D.-P. Cao, N.-N. Zheng, and F.-Y. Wang,"Parallel learning-a new framework for machine learning,"Acta Autom. Sinica, vol. 43, no. 1, pp. 1-8, 2017(in Chinese).
|
[8] |
J. Li, W. Xu, J. Zhang, M. Zhang, Z. Wang, and X. Li,"Efficient video stitching based on fast structure deformation,"IEEE Trans. Cybern., article in press, 2015. DOI:10.1109/TCYB.2014.2381774.
|
[9] |
C. Vagg, S. Akehurst, C. J. Brace, and L. Ash,"Stochastic dynamic programming in the real-world control of hybrid electric vehicles,"IEEE Trans. Control Syst. Technol., vol. 24, no. 3, pp. 853-866, Mar. 2016. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7337438
|
[10] |
P. M. Esfahani, D. Chatterjee, and J. Lygeros,"Motion planning for continuous-time stochastic processes:A dynamic programming approach,"IEEE Trans. Autom. Control, vol. 61, pp. 2155-2170, 2016. https://www.researchgate.net/publication/283790302_Motion_Planning_for_Continuous_Time_Stochastic_Processes_A_Dynamic_Programming_Approach
|
[11] |
P. J. Werbos,"Approximate dynamic programming for real-time control and neural modeling,"in Handbook of Intelligent Control:Neural, Fuzzy, and Adaptive Approaches, D.A. White and D.A. Sofge (Eds.), New York:Van Nostrand Reinhold, 1992, ch. 13. http://citeseerx.ist.psu.edu/showciting?cid=258656
|
[12] |
D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA:Athena Scientific, 1996.
|
[13] |
D. V. Prokhorov and D. C. Wunsch,"Adaptive critic designs,"IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997. http://dl.acm.org/citation.cfm?id=2326139
|
[14] |
J. Han, S. Khushalani-Solanki, J. Solanki, and J. Liang,"Adaptive critic design-based dynamic stochastic optimal control design for a microgrid with multiple renewable resources,"IEEE Trans. Smart Grid, vol. 6, no. 6, pp. 2694-2703, Jun. 2015. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7175036
|
[15] |
R. S. Sutton and A. G. Barto, Reinforcement Learning:An Introduction. Cambridge, MA:MIT Press, 1998. https://www.amazon.com/Reinforcement-Learning-Introduction-Adaptive-Computation/dp/0262193981
|
[16] |
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks,"Adaptive dynamic programming,"IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 32, no. 2, pp. 140-153, May 2002.
|
[17] |
Q. Wei, F. L. Lewis, D. Liu, R. Song, and H. Lin,"Discrete-time local value Iteration adaptive dynamic programming:Convergence analysis,"IEEE Trans. Syst., Man, Cybern. A, Syst., article in press, 2016. DOI:10.1109/TSMC.2016.2623766.
|
[18] |
Q. Wei, F. L. Lewis, Q. Sun, P. Yan, and R. Song,"Discrete-time deterministic Q-learning:A novel convergence analysis,"IEEE Trans. Cybern., article in press, 2016. DOI:10.1109/TCYB.2016.2542923.
|
[19] |
Q. Wei, D. Liu, and G. Shi,"A novel dual iterative Q-learning method for optimal battery management in smart residential environments,"IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2509-2518, Apr. 2015. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6915886
|
[20] |
Q. Wei and D. Liu,"A novel iterative-Adaptive dynamic programming for discrete-time nonlinear systems,"IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1176-1190, Oct. 2014. http://ieeexplore.ieee.org/document/6609148/
|
[21] |
Q. Wei, D. Liu, Q. Lin, and R. Song,"Discrete-time optimal control via local policy iteration adaptive dynamic programming,"IEEE Trans. Cybern., article in press, 2016. DOI:10.1109/TCYB.2016.2586082.
|
[22] |
R. Enns and J. Si,"Helicopter trimming and tracking control using direct neural dynamic programming,"IEEE Trans. Neural Netw., vol. 14, no. 4, pp. 929-939, Aug. 2003. http://ieeexplore.ieee.org/document/1215408/
|
[23] |
R. Kamalapurkar, J. R. Klotz, and W. E. Dixon,"Concurrent learningbased approximate feedback-Nash equilibrium solution of N-player nonzero-sum differential games,"IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 239-247, Jul. 2014. http://www.ieee-jas.org/CN/abstract/abstract97.shtml
|
[24] |
Q. Wei, D. Liu, and Q. Lin,"Discrete-time local iterative adaptive dynamic programming:Terminations and admissibility analysis,"IEEE Trans. Neural Netw. Learn. Syst., article in press, 2016. DOI:10.1109/TNNLS.2016.2593743.
|
[25] |
Q. Wei, R. Song, and P. Yan,"Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP,"IEEE Trans. Neural Netw. Learn. Syst., vol. 27, no. 2, pp. 444-458, Feb. 2016. http://ieeexplore.ieee.org/document/7208854/
|
[26] |
H. Zhang, C. Qin, B. Jiang, and Y. Luo,"Online adaptive policy learning algorithm for H∞ state feedback control of unknown affine nonlinear discrete-time systems,"IEEE Trans. Cybern., vol. 44, no. 12, pp. 2706-2718, Dec. 2014. https://www.ncbi.nlm.nih.gov/pubmed/25095274
|
[27] |
F.-Y. Wang and G. N. Saridis,"Suboptimal control for nonlinear stochastic systems,"Proc. 31st IEEE Conf. Decision Control, 1992. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=371109
|
[28] |
G. N. Saridis and F.-Y. Wang,"Suboptimal control of nonlinear stochastic systems,"Control Theory and Advanced Technology, vol. 10, no. 4, pp. 847-871, 1994. https://www.researchgate.net/publication/224669527_Suboptimal_control_of_nonlinear_stochastic_systems
|
[29] |
Q. Wei, D. Liu, and X. Yang,"Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems,"IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 866-879, Apr. 2015. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7052401&filter%3DAND%28p_IS_Number%3A7061550%29
|
[30] |
Q. Wei, D. Liu, Y. Liu, and R. Song,"Optimal constrained self-learning battery sequential management in microgrid via adaptive dynamic programming,"IEEE/CAA J. Autom. Sinica, article in press, 2016. DOI:10.1109/JAS.2016.7510262.
|
[31] |
Q. Zhao, H. Xu, and S. Jagannathan,"Near optimal output feedback control of nonlinear discrete-time systems based on reinforcement neural network learning,"IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp. 372-384, Oct. 2014. http://ieeexplore.ieee.org/document/4370989/
|
[32] |
Q. Wei, D. Liu, G. Shi, and Y. Liu,"Optimal multi-battery coordination control for home energy management systems via distributed iterative adaptive dynamic programming,"IEEE Trans. Ind. Electron., vol. 42, no. 7, pp. 4203-4214, Jul. 2015. https://www.researchgate.net/publication/273176842_Multi-Battery_Optimal_Coordination_Control_for_Home_Energy_Management_Systems_via_Distributed_Iterative_Adaptive_Dynamic_Programming?_sg=3y92bCwZfeymLHbpkNepKHvyJPXT_5p7IsK3eaW3YT6oX0AIaWQzP-HrmgPuGTz7HwXPz-CDc2k4U4QJ-vTZrw
|
[33] |
Q. Wei, D. Liu, and H. Lin,"Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems,"IEEE Trans. Cybern., vol. 46, no. 3, pp. 840-853, Mar. 2016. http://ieeexplore.ieee.org/document/7314890/
|
[34] |
Q. Wei, F. Wang, D. Liu, and X. Yang,"Finite-approximation-error based discrete-time iterative adaptive dynamic programming,"IEEE Trans. Cybern., vol. 44, no. 12, pp. 2820-2833, Dec. 2014. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6912005
|
[35] |
H. Li and D. Liu,"Optimal control for discrete-time affine non-linear systems using general value iteration,"IET Control Theory Appl., vol. 6, no. 18, pp. 2725-2736, Dec. 2012. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6418261
|
[36] |
W. Gao and Z.-P. Jiang,"Adaptive dynamic programming and adaptive optimal output regulation of linear systems,"IEEE Trans. Autom. Control, vol. 61, no. 12, pp. 4164-4169, Dec. 2016. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7444144
|
[37] |
Y. Duan, Y. Lv, J. Zhang, X. Zhao, and F.-Y. Wang,"Deep learning for control:The state of the art and prospects,"Acta Autom. Sinica, vol 42, no. 5, pp. 643-654, 2016. https://www.researchgate.net/publication/304888213_Deep_learning_for_control_the_state_of_the_art_and_prospects?_sg=nKxXcMNTesIgnrsxBDKmYme9XgVbVByLEqRJ5jzu_sA7M2xrAYZ40PSPmQ_DCA8aeb2SkTwtve26ulEHvKlAaQ
|
[38] |
F.-Y. Wang,"Building knowledge structure in neural nets using fuzzy logic,"Robotics and Manufacturing:Recent Trends in Research Education and Applications, M. Jamshidi (Eds.), New York, NY, ASME (American Society of Mechanical Engineers) Press, 1992.
|
[39] |
F.-Y. Wang and H.-A. Kim,"Implementing adaptive fuzzy logic controllers with neural networks:a design paradigm,"J. Intell. Fuzzy Syst., vol. 3, no. 2, pp. 165-180, 1995. https://www.researchgate.net/publication/305161757_Implementing_adaptive_fuzzy_logic_controllers_with_neural_networks_A_design_paradigm?_sg=JppkPZebku65ugc2wT3J8qk6iDZ_ugv1IatEl7w9LTcd661RChmgoIk0hB4H1gAF_8PUr1AdDtOadBj6hI9SrQ
|
[40] |
F.-Y. Wang,"The emergence of intelligent enterprises:From CPS to CPSS,"IEEE Intell. Syst., vol. 25, no. 4, pp. 85-88, 2010. doi: 10.1109/MIS.2010.104
|
[41] |
C. Nyce,"Predictive analytics white paper,"American Institute for Chartered Property Casualty Underwriters/Insurance Institute of America, 2007.
|
[42] |
W. Eckerson,"Extending the value of your data warehousing investment,"The Data Warehouse Institute, USA, 2007.
|
[43] |
J. R. Evans and C. H. Lindner,"Business analytics:The next frontier for decision sciences,"Decision Line, vol. 43, no. 2, pp. 1-4, Mar. 2012.
|
[44] |
J. Zhang, Q. Wei, and F.-Y. Wang,"Parallel dynammic programming with an average-greedy mechanism for discrete systems,"SKLMCCS/QAⅡ Tech Report 01-09-2016, ASIA, Beijing, China.
|
[45] |
F.-Y. Wang,"Parallel control:a method for data-driven and computational control,"Acta Autom.a Sinica, vol.39, no. 2, pp. 293-302, 2013. http://www.aas.net.cn/EN/abstract/abstract17915.shtml
|
[46] |
F.-Y. Wang,"Control 5.0:From Newton to Merton in Popper's Cyber-Social-Physical Spaces,"IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 233-234, 2016. doi: 10.1109/JAS.2016.7508796
|