IEEE/CAA Journal of Automatica Sinica
Citation: | Ali Forootani, Raffaele Iervolino, Massimo Tipaldi and Joshua Neilson, "Approximate Dynamic Programming for Stochastic Resource Allocation Problems," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 975-990, July 2020. doi: 10.1109/JAS.2020.1003231 |
[1] |
D. Bertsimas, S. Gupta, and G. Lulli, “Dynamic resource allocation: A flexible and tractable modeling framework,” European J. Operational Research, vol. 236, pp. 14–26, 2014. doi: 10.1016/j.ejor.2013.10.063
|
[2] |
S. M. Kandil, H. E. Z. Farag, M. F. Shaaban, and M. Z. El-Sharafy, “A combined resource allocation framework for PEVs charging stations, renewable energy resources and distributed energy storage systems,” Energy, vol. 143, pp. 961–972, 2018. doi: 10.1016/j.energy.2017.11.005
|
[3] |
J. Acimovic and S. C. Graves, “Making better fulfillment decisions on the fly in an online retail environment,” Manufacturing&Service Operations Management, vol. 17, no. 1, pp. 1–18, 2014.
|
[4] |
T. Liu, B. Tian, Y. Ai, Y. Zou, and F. Y. Wang, “Parallel reinforcement learning-based energy efficiency improvement for a cyber-physical system,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 2, pp. 617–626, 2020.
|
[5] |
L. Jiang, H. Huang, and Z. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, 2019. doi: 10.1109/JAS.2019.1911732
|
[6] |
A. Pietrabissa, F. D. Priscoli, A. D. Giorgio, A. Giuseppi, M. Panfili, and V. Suraci, “An approximate dynamic programming approach to resource management in multi-cloud scenarios,” Int. J. Control, vol. 90, pp. 492–503, 2017. doi: 10.1080/00207179.2016.1185802
|
[7] |
D. P. Bertsekas, “Dynamic Programming and Optimal Control, Vol. I” Athena Scientific, Belmont, Massachusetts, USA, 2017. [Online]. Available: http://www.athenasc.com/dpbook.html
|
[8] |
D. P. Bertsekas, “Dynamic Programming and Optimal Control, Vol. II” Athena Scientific, Belmont, Massachusetts, USA, 2012. [Online]. Available: http://www.athenasc.com/dpbook.html
|
[9] |
C. Quan and F. Ren, “Weighted high-order hidden Markov models for compound emotions recognition in text,” Information Sciences, vol. 329, pp. 581–596, 2016. doi: 10.1016/j.ins.2015.09.050
|
[10] |
V. M. Baskaran, Y. C. Chang, J. Loo, K. Wong, and M. Gan, “Dominant speaker detection in multipoint video communication using Markov chain with non-linear weights and dynamic transition window,” Information Sciences, vol. 464, pp. 344–362, 2018.
|
[11] |
A. Pietrabissa, M. Castrucci, and A. Palo, “An MDP approach to faulttolerant routing,” European J. Control, vol. 18, no. 4, pp. 334–347, 2012. doi: 10.3166/ejc.18.334-347
|
[12] |
D. P. Bertsekas, “Feature-based aggregation and deep reinforcement learning: A survey and some new implementations,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1–31, 2019. doi: 10.1109/JAS.2018.7511249
|
[13] |
D. P. Bertsekas, “Dynamic programming and suboptimal control: A survey from ADP to MPC,” European J. Control, vol. 11, no. 4, pp. 310–334, 2005.
|
[14] |
Z. Zhang and D. Zhao, “Clique-based cooperative multiagent reinforcement learning using factor graphs,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 248–256, 2014. doi: 10.1109/JAS.2014.7004682
|
[15] |
M. Kumar, K. Rajagopal, S. N. Balakrishnan, and N. T. Nguyen, “Reinforcement learning based controller synthesis for flexible aircraft wings,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp. 435–448, 2014. doi: 10.1109/JAS.2014.7004670
|
[16] |
A. Forootani, M. Tipaldi, M. G. Zarch, D. Liuzza, and L. Glielmo, “Modelling and solving resource allocation problems via a dynamic programming approach,” Int. J. Control, 2019. doi: 10.1080/00207179.2019.1661521
|
[17] |
A. Forootani, D. Liuzza, M. Tipaldi, L. Glielmo, “Allocating resources via price management systems: a dynamic programming-based approach,” Int. Journal of Control, 2019. doi: 10.1080/00207179.2019.1694178
|
[18] |
D. P. Bertsekas, “Temporal difference methods for general projected equations,” IEEE Trans. Automatic Control, vol. 56, no. 9, pp. 2128–2139, 2011. doi: 10.1109/TAC.2011.2115290
|
[19] |
A. Forootani, R. Iervolino, and M. Tipaldi, “Applying unweighted least squares based techniques to stochastic dynamic programming: Theory and application,” IET Control Theory&Applications, vol. 13, no. 15, pp. 2387–2398, 2019.
|
[20] |
D. Bertsekas, “Approximate policy iteration: A survey and some new methods,” J. Control Theory and Applications, vol. 9, no. 3, pp. 310–335, 2011. doi: 10.1007/s11768-011-1005-3
|
[21] |
D. P. Bertsekas and J. N. Tsitsiklis, Introduction to Probability, Athena Scientific Optimization and Computation Series, 2nd. ed., Belmont, Massachusetts, USA, 2008.
|
[22] |
D. P. de Farias and B. Van Roy, “The linear programming approach to approximate dynamic programming,” Operations Research, vol. 51, no. 6, pp. 850–865, 2003. doi: 10.1287/opre.51.6.850.24925
|
[23] |
H. Yu and D. P. Bertsekas, “Convergence results for some temporal difference methods based on least squares,” IEEE Trans. Automatic Control, vol. 54, no. 7, pp. 1515–1531, 2009. doi: 10.1109/TAC.2009.2022097
|
[24] |
L. Busoniu, D. Ernst, B. De Schutter, and R. Babuska, “Cross-entropy optimization of control policies with adaptive basis functions,” IEEE Trans. Systems,Man,and Cybernetics - Part B:Cybernetics, vol. 41, no. 1, pp. 196–209, 2011. doi: 10.1109/TSMCB.2010.2050586
|
[25] |
D. P. Bertsekas and H. Yu, “Projected equation methods for approximate solution of large linear systems,” J. Computational and Applied Mathematics, vol. 227, pp. 27–50, 2009. doi: 10.1016/j.cam.2008.07.037
|
[26] |
A. R. Andersen, B. F. Nielson, and L. B. Reinhardt, “Optimization of hospital ward resources with patient relocation using Markov chain modeling,” European J. Operational Research, vol. 260, no. 3, pp. 1152–1163, 2017. doi: 10.1016/j.ejor.2017.01.026
|
[27] |
Z. Huang, W. M. Van Der Aalst, Xu. Lu, and H. Duan, “Reinforcement learning based resource allocation in business process management,” Data&Knowledge Engineering, vol. 70, no. 1, pp. 127–145, 2011.
|
[28] |
K. Zheng, H. Meng, P. Chatzimisios, L. Lei, and X. Shen, “An SMDP based resource allocation in vehicular cloud computing systems,” IEEE Trans. Industrial Electronics, vol. 62, no. 12, pp. 7920–7928, 2015. doi: 10.1109/TIE.2015.2482119
|
[29] |
D. Astaraky and J. Patrick, “A simulation based approximate dynamic programming approach to multi-class multi resource surgical scheduling,” European J. Operational Research, vol. 245, no. 1, pp. 309–319, 2015. doi: 10.1016/j.ejor.2015.02.032
|
[30] |
R. Pourmoayed and L. R. Nielsen, “An approximate dynamic programming approach for sequential pig marketing decisions at herd level,” European J. Operational Research, vol. 276, no. 3, pp. 1056–1070, 2019. doi: 10.1016/j.ejor.2019.01.050
|
[31] |
S. Fianu and L. B. Davis, “A Markov decision process model for equitable distribution of supplies under uncertainty,” European J. Operational Research, vol. 264, pp. 1101–1115, 2018. doi: 10.1016/j.ejor.2017.07.017
|
[32] |
R. W. Lien, S. M. R. Iravani, and K. R. smilowitz, “Sequential resource allocation for nonprofit operations,” Operations Research, vol. 62, no. 2, pp. 301–317, 2014. doi: 10.1287/opre.2013.1244
|
[33] |
Z. Deng, “Distributed algorithm design for resource allocation problems of second-order multi-agent systems over weight-balanced digraphs,” IEEE Trans. Systems,Man,and Cybernetics, 2019. doi: 10.1109/TSMC.2019.2930672
|
[34] |
R. Shone, K. Glazebrook, and K. G. Zografos, “Resource allocation in congested queueing systems with time-varying demand: An application to airport operations,” European J. Operational Research, vol. 276, no. 2, pp. 566–581, 2019. doi: 10.1016/j.ejor.2019.01.024
|
[35] |
D. Bertsimas, J. D. Grifth, V. Gupta, M. J. Kochenderfer, and V. V. Misic, “Comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems,” European J. Operational Research, vol. 263, no. 2, pp. 664–678, 2017. doi: 10.1016/j.ejor.2017.05.032
|
[36] |
A. Bayram, S. Solak, and M. Johnson, “Stochastic models for strategic resource allocation in non profit foreclosed housing acquisitions,” European J. Operational Research, vol. 233, no. 1, pp. 246–262, 2014. doi: 10.1016/j.ejor.2013.08.040
|
[37] |
D. Castanon, J. M. Wohletz, “Model predictive control for stochastic resource allocation,” IEEE Trans. Automatic Control, vol. 54, no. 8, pp. 1739–1750, 2009. doi: 10.1109/TAC.2009.2024562
|
[38] |
M. P. Dal Pont, R. S. Ferreira, W. W. Teixeira, D. M. Lima, and J. E. Normey-Rico, “MPC with machine learning applied to resource allocation problem using Lambda architecture,” IFAC-Papers OnLine, vol. 52, no. 1, pp. 550–555, 2019. doi: 10.1016/j.ifacol.2019.06.120
|