A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 7 Issue 4
Jun.  2020

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Ali Forootani, Raffaele Iervolino, Massimo Tipaldi and Joshua Neilson, "Approximate Dynamic Programming for Stochastic Resource Allocation Problems," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 975-990, July 2020. doi: 10.1109/JAS.2020.1003231
Citation: Ali Forootani, Raffaele Iervolino, Massimo Tipaldi and Joshua Neilson, "Approximate Dynamic Programming for Stochastic Resource Allocation Problems," IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 975-990, July 2020. doi: 10.1109/JAS.2020.1003231

Approximate Dynamic Programming for Stochastic Resource Allocation Problems

doi: 10.1109/JAS.2020.1003231
More Information
  • A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations (i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming (DP) based algorithms. In this regard, Bellman’s backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations, occurs. In particular, an approximate dynamic programming (ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach.

     

  • loading
  • [1]
    D. Bertsimas, S. Gupta, and G. Lulli, “Dynamic resource allocation: A flexible and tractable modeling framework,” European J. Operational Research, vol. 236, pp. 14–26, 2014. doi: 10.1016/j.ejor.2013.10.063
    [2]
    S. M. Kandil, H. E. Z. Farag, M. F. Shaaban, and M. Z. El-Sharafy, “A combined resource allocation framework for PEVs charging stations, renewable energy resources and distributed energy storage systems,” Energy, vol. 143, pp. 961–972, 2018. doi: 10.1016/j.energy.2017.11.005
    [3]
    J. Acimovic and S. C. Graves, “Making better fulfillment decisions on the fly in an online retail environment,” Manufacturing&Service Operations Management, vol. 17, no. 1, pp. 1–18, 2014.
    [4]
    T. Liu, B. Tian, Y. Ai, Y. Zou, and F. Y. Wang, “Parallel reinforcement learning-based energy efficiency improvement for a cyber-physical system,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 2, pp. 617–626, 2020.
    [5]
    L. Jiang, H. Huang, and Z. Ding, “Path planning for intelligent robots based on deep Q-learning with experience replay and heuristic knowledge,” IEEE/CAA J. Autom. Sinica, 2019. doi: 10.1109/JAS.2019.1911732
    [6]
    A. Pietrabissa, F. D. Priscoli, A. D. Giorgio, A. Giuseppi, M. Panfili, and V. Suraci, “An approximate dynamic programming approach to resource management in multi-cloud scenarios,” Int. J. Control, vol. 90, pp. 492–503, 2017. doi: 10.1080/00207179.2016.1185802
    [7]
    D. P. Bertsekas, “Dynamic Programming and Optimal Control, Vol. I” Athena Scientific, Belmont, Massachusetts, USA, 2017. [Online]. Available: http://www.athenasc.com/dpbook.html
    [8]
    D. P. Bertsekas, “Dynamic Programming and Optimal Control, Vol. II” Athena Scientific, Belmont, Massachusetts, USA, 2012. [Online]. Available: http://www.athenasc.com/dpbook.html
    [9]
    C. Quan and F. Ren, “Weighted high-order hidden Markov models for compound emotions recognition in text,” Information Sciences, vol. 329, pp. 581–596, 2016. doi: 10.1016/j.ins.2015.09.050
    [10]
    V. M. Baskaran, Y. C. Chang, J. Loo, K. Wong, and M. Gan, “Dominant speaker detection in multipoint video communication using Markov chain with non-linear weights and dynamic transition window,” Information Sciences, vol. 464, pp. 344–362, 2018.
    [11]
    A. Pietrabissa, M. Castrucci, and A. Palo, “An MDP approach to faulttolerant routing,” European J. Control, vol. 18, no. 4, pp. 334–347, 2012. doi: 10.3166/ejc.18.334-347
    [12]
    D. P. Bertsekas, “Feature-based aggregation and deep reinforcement learning: A survey and some new implementations,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 1–31, 2019. doi: 10.1109/JAS.2018.7511249
    [13]
    D. P. Bertsekas, “Dynamic programming and suboptimal control: A survey from ADP to MPC,” European J. Control, vol. 11, no. 4, pp. 310–334, 2005.
    [14]
    Z. Zhang and D. Zhao, “Clique-based cooperative multiagent reinforcement learning using factor graphs,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 3, pp. 248–256, 2014. doi: 10.1109/JAS.2014.7004682
    [15]
    M. Kumar, K. Rajagopal, S. N. Balakrishnan, and N. T. Nguyen, “Reinforcement learning based controller synthesis for flexible aircraft wings,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp. 435–448, 2014. doi: 10.1109/JAS.2014.7004670
    [16]
    A. Forootani, M. Tipaldi, M. G. Zarch, D. Liuzza, and L. Glielmo, “Modelling and solving resource allocation problems via a dynamic programming approach,” Int. J. Control, 2019. doi: 10.1080/00207179.2019.1661521
    [17]
    A. Forootani, D. Liuzza, M. Tipaldi, L. Glielmo, “Allocating resources via price management systems: a dynamic programming-based approach,” Int. Journal of Control, 2019. doi: 10.1080/00207179.2019.1694178
    [18]
    D. P. Bertsekas, “Temporal difference methods for general projected equations,” IEEE Trans. Automatic Control, vol. 56, no. 9, pp. 2128–2139, 2011. doi: 10.1109/TAC.2011.2115290
    [19]
    A. Forootani, R. Iervolino, and M. Tipaldi, “Applying unweighted least squares based techniques to stochastic dynamic programming: Theory and application,” IET Control Theory&Applications, vol. 13, no. 15, pp. 2387–2398, 2019.
    [20]
    D. Bertsekas, “Approximate policy iteration: A survey and some new methods,” J. Control Theory and Applications, vol. 9, no. 3, pp. 310–335, 2011. doi: 10.1007/s11768-011-1005-3
    [21]
    D. P. Bertsekas and J. N. Tsitsiklis, Introduction to Probability, Athena Scientific Optimization and Computation Series, 2nd. ed., Belmont, Massachusetts, USA, 2008.
    [22]
    D. P. de Farias and B. Van Roy, “The linear programming approach to approximate dynamic programming,” Operations Research, vol. 51, no. 6, pp. 850–865, 2003. doi: 10.1287/opre.51.6.850.24925
    [23]
    H. Yu and D. P. Bertsekas, “Convergence results for some temporal difference methods based on least squares,” IEEE Trans. Automatic Control, vol. 54, no. 7, pp. 1515–1531, 2009. doi: 10.1109/TAC.2009.2022097
    [24]
    L. Busoniu, D. Ernst, B. De Schutter, and R. Babuska, “Cross-entropy optimization of control policies with adaptive basis functions,” IEEE Trans. Systems,Man,and Cybernetics - Part B:Cybernetics, vol. 41, no. 1, pp. 196–209, 2011. doi: 10.1109/TSMCB.2010.2050586
    [25]
    D. P. Bertsekas and H. Yu, “Projected equation methods for approximate solution of large linear systems,” J. Computational and Applied Mathematics, vol. 227, pp. 27–50, 2009. doi: 10.1016/j.cam.2008.07.037
    [26]
    A. R. Andersen, B. F. Nielson, and L. B. Reinhardt, “Optimization of hospital ward resources with patient relocation using Markov chain modeling,” European J. Operational Research, vol. 260, no. 3, pp. 1152–1163, 2017. doi: 10.1016/j.ejor.2017.01.026
    [27]
    Z. Huang, W. M. Van Der Aalst, Xu. Lu, and H. Duan, “Reinforcement learning based resource allocation in business process management,” Data&Knowledge Engineering, vol. 70, no. 1, pp. 127–145, 2011.
    [28]
    K. Zheng, H. Meng, P. Chatzimisios, L. Lei, and X. Shen, “An SMDP based resource allocation in vehicular cloud computing systems,” IEEE Trans. Industrial Electronics, vol. 62, no. 12, pp. 7920–7928, 2015. doi: 10.1109/TIE.2015.2482119
    [29]
    D. Astaraky and J. Patrick, “A simulation based approximate dynamic programming approach to multi-class multi resource surgical scheduling,” European J. Operational Research, vol. 245, no. 1, pp. 309–319, 2015. doi: 10.1016/j.ejor.2015.02.032
    [30]
    R. Pourmoayed and L. R. Nielsen, “An approximate dynamic programming approach for sequential pig marketing decisions at herd level,” European J. Operational Research, vol. 276, no. 3, pp. 1056–1070, 2019. doi: 10.1016/j.ejor.2019.01.050
    [31]
    S. Fianu and L. B. Davis, “A Markov decision process model for equitable distribution of supplies under uncertainty,” European J. Operational Research, vol. 264, pp. 1101–1115, 2018. doi: 10.1016/j.ejor.2017.07.017
    [32]
    R. W. Lien, S. M. R. Iravani, and K. R. smilowitz, “Sequential resource allocation for nonprofit operations,” Operations Research, vol. 62, no. 2, pp. 301–317, 2014. doi: 10.1287/opre.2013.1244
    [33]
    Z. Deng, “Distributed algorithm design for resource allocation problems of second-order multi-agent systems over weight-balanced digraphs,” IEEE Trans. Systems,Man,and Cybernetics, 2019. doi: 10.1109/TSMC.2019.2930672
    [34]
    R. Shone, K. Glazebrook, and K. G. Zografos, “Resource allocation in congested queueing systems with time-varying demand: An application to airport operations,” European J. Operational Research, vol. 276, no. 2, pp. 566–581, 2019. doi: 10.1016/j.ejor.2019.01.024
    [35]
    D. Bertsimas, J. D. Grifth, V. Gupta, M. J. Kochenderfer, and V. V. Misic, “Comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems,” European J. Operational Research, vol. 263, no. 2, pp. 664–678, 2017. doi: 10.1016/j.ejor.2017.05.032
    [36]
    A. Bayram, S. Solak, and M. Johnson, “Stochastic models for strategic resource allocation in non profit foreclosed housing acquisitions,” European J. Operational Research, vol. 233, no. 1, pp. 246–262, 2014. doi: 10.1016/j.ejor.2013.08.040
    [37]
    D. Castanon, J. M. Wohletz, “Model predictive control for stochastic resource allocation,” IEEE Trans. Automatic Control, vol. 54, no. 8, pp. 1739–1750, 2009. doi: 10.1109/TAC.2009.2024562
    [38]
    M. P. Dal Pont, R. S. Ferreira, W. W. Teixeira, D. M. Lima, and J. E. Normey-Rico, “MPC with machine learning applied to resource allocation problem using Lambda architecture,” IFAC-Papers OnLine, vol. 52, no. 1, pp. 550–555, 2019. doi: 10.1016/j.ifacol.2019.06.120

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(5)

    Article Metrics

    Article views (1174) PDF downloads(95) Cited by()

    Highlights

    • MDP based resource allocation problem is proposed.
    • MPC is considered in the framework of the MDP.
    • Algorithms suitable for computer implementation are proposed.
    • Compressive sampling is considered for ADP.
    • Linear architecture is considered for ADP.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return