A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 11 Issue 2
Feb.  2024

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
O. Dogru, J. Xie, O. Prakash, R. Chiplunkar, J. Soesanto, H. Chen, K. Velswamy, F. Ibrahim, and  B. Huang,  “Reinforcement learning in process industries: Review and perspective,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 283–300, Feb. 2024. doi: 10.1109/JAS.2024.124227
Citation: O. Dogru, J. Xie, O. Prakash, R. Chiplunkar, J. Soesanto, H. Chen, K. Velswamy, F. Ibrahim, and  B. Huang,  “Reinforcement learning in process industries: Review and perspective,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 2, pp. 283–300, Feb. 2024. doi: 10.1109/JAS.2024.124227

Reinforcement Learning in Process Industries: Review and Perspective

doi: 10.1109/JAS.2024.124227
Funds:  This work was supported in part by the Natural Sciences Engineering Research Council of Canada (NSERC)
More Information
  • This survey paper provides a review and perspective on intermediate and advanced reinforcement learning (RL) techniques in process industries. It offers a holistic approach by covering all levels of the process control hierarchy. The survey paper presents a comprehensive overview of RL algorithms, including fundamental concepts like Markov decision processes and different approaches to RL, such as value-based, policy-based, and actor-critic methods, while also discussing the relationship between classical control and RL. It further reviews the wide-ranging applications of RL in process industries, such as soft sensors, low-level control, high-level control, distributed process control, fault detection and fault tolerant control, optimization, planning, scheduling, and supply chain. The survey paper discusses the limitations and advantages, trends and new applications, and opportunities and future prospects for RL in process industries. Moreover, it highlights the need for a holistic approach in complex systems due to the growing importance of digitalization in the process industries.

     

  • loading
  • [1]
    C. Liu, J. Ding, and J. Sun, “Reinforcement learning based decision making of operational indices in process industry under changing environment,” IEEE Trans. Ind. Inf., vol. 17, no. 4, pp. 2727–2736, Apr. 2021. doi: 10.1109/TII.2020.3005207
    [2]
    D. P. Bertsekas, Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control. Belmont, USA: Athena Scientific, 2022.
    [3]
    T. Kegyes, Z. Süle, and J. Abonyi, “The applicability of reinforcement learning methods in the development of Industry 4.0 applications,” Complexity, vol. 2021, p. 7179374, Nov. 2021.
    [4]
    D. E. Seborg, T. F. Edgar, and D. A. Mellichamp, Process Dynamics and Control. 3rd ed. Hoboken, USA: Wiley, 2011.
    [5]
    H. R. Chi, A. Radwan, N.-F. Huang, and K. F. Tsang, “Guest editorial: Next-generation network automation for industrial internet-of-things in Industry 5.0,” IEEE Trans. Ind. Inf., vol. 19, no. 2, pp. 2062–2064, Feb. 2023. doi: 10.1109/TII.2022.3216903
    [6]
    T. S. Chu, A. B. Culaba, and J. A. C. Jose, “Robotics in the fifth industrial revolution,” in Proc. IEEE 14th Int. Conf. Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management, Boracay Island, Philippines, 2022, pp. 1–6.
    [7]
    B. Rolf, I. Jackson, M. Müller, S. Lang, T. Reggelin, and D. Ivanov, “A review on reinforcement learning algorithms and applications in supply chain management,” Int. J. Prod. Res., vol. 61, no. 20, pp. 7151–7179, Nov. 2022.
    [8]
    J. Shin, T. A. Badgwell, K.-H. Liu, and J. H. Lee, “Reinforcement learning—Overview of recent progress and implications for process control,” Comput. Chem. Eng., vol. 127, pp. 282–294, Aug. 2019. doi: 10.1016/j.compchemeng.2019.05.029
    [9]
    P. Kumar and A. S. Hati, “Review on machine learning algorithm based fault detection in induction motors,” Arch. Comput. Methods Eng., vol. 28, no. 3, pp. 1929–1940, May 2021. doi: 10.1007/s11831-020-09446-w
    [10]
    Y. Lei, B. Yang, X. Jiang, F. Jia, N. Li, and A. K. Nandi, “Applications of machine learning to machine fault diagnosis: A review and roadmap,” Mech. Syst. Signal Process., vol. 138, p. 106587, Apr. 2020. doi: 10.1016/j.ymssp.2019.106587
    [11]
    R. Nian, J. Liu, and B. Huang, “A review on reinforcement learning: Introduction and applications in industrial process control,” Comput. Chem. Eng., vol. 139, p. 106886, Aug. 2020. doi: 10.1016/j.compchemeng.2020.106886
    [12]
    H. Yoo, H. E. Byun, D. Han, and J. H. Lee, “Reinforcement learning for batch process control: Review and perspectives,” Annu. Rev. Control, vol. 52, pp. 108–119, Oct. 2021. doi: 10.1016/j.arcontrol.2021.10.006
    [13]
    R. R. Negenborn and J. M. Maestre, “Distributed model predictive control: An overview and roadmap of future research opportunities,” IEEE Control Syst. Mag., vol. 34, no. 4, pp. 87–97, Aug. 2014. doi: 10.1109/MCS.2014.2320397
    [14]
    M. Ellis, J. Liu, and P. D. Christofides, Economic Model Predictive Control: Theory, Formulations and Chemical Process Applications. Cham, Germany: Springer, 2017.
    [15]
    S. Mata, A. Zubizarreta, and C. Pinto, “Robust tube-based model predictive control for lateral path tracking,” IEEE Trans. Intell. Veh., vol. 4, no. 4, pp. 569–577, Dec. 2019. doi: 10.1109/TIV.2019.2938102
    [16]
    A. Mesbah, “Stochastic model predictive control: An overview and perspectives for future research,” IEEE Control Syst. Mag., vol. 36, no. 6, pp. 30–44, Dec. 2016. doi: 10.1109/MCS.2016.2602087
    [17]
    L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, “Learning-based model predictive control: Toward safe learning in control,” Annu. Rev. Control,Robot.,Auton. Syst., vol. 3, pp. 269–296, May 2020. doi: 10.1146/annurev-control-090419-075625
    [18]
    T. Zhang, S. Li, and Y. Zheng, “Implementable stability guaranteed Lyapunov-based data-driven model predictive control with evolving Gaussian process,” Ind. Eng. Chem. Res., vol. 61, no. 39, pp. 14681–14690, Sept. 2022. doi: 10.1021/acs.iecr.2c01963
    [19]
    J. Berberich, J. Köhler, M. A. Müller, and F. Allgöwer, “Data-driven model predictive control with stability and robustness guarantees,” IEEE Trans. Automat. Control, vol. 66, no. 4, pp. 1702–1717, Apr. 2021. doi: 10.1109/TAC.2020.3000182
    [20]
    P. Kumar, J. B. Rawlings, and S. J. Wright, “Industrial, large-scale model predictive control with structured neural networks,” Comput. Chem. Eng., vol. 150, p. 107291, Jul. 2021. doi: 10.1016/j.compchemeng.2021.107291
    [21]
    Y. M. Ren, M. S. Alhajeri, J. Luo, S. Chen, F. Abdullah, Z. Wu, and P. D. Christofides, “A tutorial review of neural network modeling approaches for model predictive control,” Comput. Chem. Eng., vol. 165, p. 107956, Sept. 2022. doi: 10.1016/j.compchemeng.2022.107956
    [22]
    Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, “Greedy layer-wise training of deep networks,” in Proc. 19th Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2006, pp. 153–160.
    [23]
    I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, USA: MIT Press, 2016.
    [24]
    S. Sengupta, S. Basak, P. Saikia, S. Paul, V. Tsalavoutis, F. Atiah, V. Ravi, and A. Peters, “A review of deep learning with special emphasis on architectures, applications and recent trends,” Knowl.-Based Syst., vol. 194, p. 105596, Apr. 2020. doi: 10.1016/j.knosys.2020.105596
    [25]
    J. Yu and Y. Zhang, “Challenges and opportunities of deep learning-based process fault detection and diagnosis: A review,” Neural Comput. Appl., vol. 35, no. 1, pp. 211–252, Jan. 2023. doi: 10.1007/s00521-022-08017-3
    [26]
    L. Biggio and I. Kastanis, “Prognostics and health management of industrial assets: Current progress and road ahead,” Front. Artif. Intell., vol. 3, p. 578613, Nov. 2020. doi: 10.3389/frai.2020.578613
    [27]
    S. Zhang and T. Qiu, “Semi-supervised LSTM ladder autoencoder for chemical process fault diagnosis and localization,” Chem. Eng. Sci., vol. 251, p. 117467, Apr. 2022. doi: 10.1016/j.ces.2022.117467
    [28]
    J. Qian, Z. Song, Y. Yao, Z. Zhu, and X. Zhang, “A review on autoencoder based representation learning for fault detection and diagnosis in industrial processes,” Chem. Intell. Lab. Syst., vol. 231, p. 104711, Dec. 2022. doi: 10.1016/j.chemolab.2022.104711
    [29]
    Q. Sun and Z. Ge, “A survey on deep learning for data-driven soft sensors,” IEEE Trans. Ind. Inf., vol. 17, no. 9, pp. 5853–5866, Sept. 2021. doi: 10.1109/TII.2021.3053128
    [30]
    X. Yuan, B. Huang, Y. Wang, C. Yang, and W. Gui, “Deep learning-based feature representation and its application for soft sensor modeling with variable-wise weighted SAE,” IEEE Trans. Ind. Inf., vol. 14, no. 7, pp. 3235–3243, Jul. 2018. doi: 10.1109/TII.2018.2809730
    [31]
    X. Yuan, L. Li, Y. A. W. Shardt, Y. Wang, and C. Yang, “Deep learning with spatiotemporal attention-based LSTM for industrial soft sensor model development,” IEEE Trans. Ind. Electron., vol. 68, no. 5, pp. 4404–4414, May 2021. doi: 10.1109/TIE.2020.2984443
    [32]
    F. Amjad, S. K. Varanasi, and B. Huang, “Kalman filter-based convolutional neural network for robust tracking of froth-middling interface in a primary separation vessel in presence of occlusions,” IEEE Trans. Instrum. Meas., vol. 70, p. 5007308, Feb. 2021.
    [33]
    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. 2nd ed. Cambridge, USA: MIT Press, 2018.
    [34]
    J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” in Proc. 4th Int. Conf. Learning Representations, San Juan, Puerto Rico, 2016.
    [35]
    S. D.-C. Shashua and S. Mannor, “Kalman meets Bellman: Improving policy evaluation through value tracking,” arXiv preprint arXiv: 2002.07171, 2020.
    [36]
    Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos, K. Kavukcuoglu, and N. de Freitas, “Sample efficient actor-critic with experience replay,” in Proc. 5th Int. Conf. Learning Representations, Toulon, France, 2017.
    [37]
    M. Okada and T. Taniguchi, “DreamingV2: Reinforcement learning with discrete world models without reconstruction,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Kyoto, Japan, 2022, pp. 985–991.
    [38]
    N. Stiennon, L. Ouyang, J. Wu, D. M. Ziegler, R. Lowe, C. Voss, A. Radford, D. Amodei, and P. F. Christiano, “Learning to summarize from human feedback,” in Proc. 34th Conf. Neural Information Processing Systems, Vancouver, Canada, 2020, pp. 3008–3021.
    [39]
    O. Dogru, K. Velswamy, and B. Huang, “Actor-critic reinforcement learning and application in developing computer-vision-based interface tracking,” Engineering, vol. 7, no. 9, pp. 1248–1261, Sept. 2021. doi: 10.1016/j.eng.2021.04.027
    [40]
    R. Bellman, “The theory of dynamic programming,” Bull. Amer. Math. Soc., vol. 60, no. 6, pp. 503–515, Nov. 1954. doi: 10.1090/S0002-9904-1954-09848-8
    [41]
    S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv: 2005.01643, 2020.
    [42]
    D. P. Bertsekas, Reinforcement Learning and Optimal Control. Belmont, USA: Athena Scientific, 2019.
    [43]
    C. J. C. H. Watkins and P. Dayan, “Q-learning,” Mach. Learn., vol. 8, pp. 3–4, May 1992.
    [44]
    Q. Wei, T. Li, and D. Liu, “Learning control for air conditioning systems via human expressions,” IEEE Trans. Ind. Electron., vol. 68, no. 8, pp. 7662–7671, Aug. 2021. doi: 10.1109/TIE.2020.3001849
    [45]
    A. Kekuda, R. Anirudh, and M. Krishnan, “Reinforcement learning based intelligent traffic signal control using n-step SARSA,” in Proc. Int. Conf. Artificial Intelligence and Smart Systems, Coimbatore, India, 2021, pp. 379–384.
    [46]
    B. Ning, F. H. T. Lin, and S. Jaimungal, “Double deep Q-learning for optimal execution,” Appl. Math. Finance, vol. 28, no. 4, pp. 361–380, Oct. 2021. doi: 10.1080/1350486X.2022.2077783
    [47]
    P. Casgrain, B. Ning, and S. Jaimungal, “Deep Q-learning for Nash equilibria: Nash-DQN,” Appl. Math. Finance, vol. 29, no. 1, pp. 62–78, Nov. 2022. doi: 10.1080/1350486X.2022.2136727
    [48]
    V. Konda, Actor-Critic Algorithms. Cambridge, USA: Massachusetts Institute of Technology, 2000, pp. 1008–1014.
    [49]
    D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014.
    [50]
    S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” J. Mach. Learn. Res., vol. 17, no. 1, pp. 1334–1373, Jan. 2016.
    [51]
    R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Mach. Learn., vol. 8, no. 3–4, pp. 229–256, May 1992. doi: 10.1007/BF00992696
    [52]
    R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proc. 12th Int. Conf. Neural Information Processing Systems, Denver, USA, 1999, pp. 1057–1063.
    [53]
    I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. 27th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
    [54]
    I. Grondman, L. Busoniu, G. A. D. Lopes, and R. Babuska, “A survey of actor-critic reinforcement learning: Standard and natural policy gradients,” IEEE Trans. Syst.,Man,Cybern.,Part C (Appl. Rev.), vol. 42, no. 6, pp. 1291–1307, Nov. 2012. doi: 10.1109/TSMCC.2012.2218595
    [55]
    T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018.
    [56]
    V. Mnih, A. P. Badia, M. Mirza, T. Harley, A. Graves, T. Lillicrap, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 1928–1937.
    [57]
    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” in Proc. 4th Int. Conf. Learning Representations, San Juan, Puerto Rico, 2016.
    [58]
    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv: 1707.06347, 2017.
    [59]
    R. Grosse and J. Martens, “A kronecker-factored approximate fisher matrix for convolution layers,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 573–582.
    [60]
    T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, and S. Levine, “Soft actor-critic algorithms and applications,” arXiv preprint arXiv: 1812.05905, 2018.
    [61]
    S. Fujimoto, H. Van Hoof, and D. Meger, “Addressing function approximation error in actor-critic methods,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018.
    [62]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, Feb. 2015. doi: 10.1038/nature14236
    [63]
    P. Langevin, “Sur la theorie du mouvement brownien,” C. R. Acad. Sci., vol. 146, pp. 530–533, 1908.
    [64]
    R. Munos, T. Stepleton, A. Harutyunyan, and M. G. Bellemare, “Safe and efficient off-policy reinforcement learning,” in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 1054–1062.
    [65]
    T. Shuprajhaa, S. K. Sujit, and K. Srinivasan, “Reinforcement learning based adaptive PID controller design for control of linear/nonlinear unstable processes,” Appl. Soft Comput., vol. 128, p. 109450, Oct. 2022. doi: 10.1016/j.asoc.2022.109450
    [66]
    Y. Wu, E. Mansimov, S. Liao, R. Grosse, and J. Ba, “Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation,” in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, USA, 2017, pp. 5285–5294.
    [67]
    V. Taboga, A. Bellahsen, and H. Dagdougui, “An enhanced adaptivity of reinforcement learning-based temperature control in buildings using generalized training,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 6, no. 2, pp. 255–266, Apr. 2022. doi: 10.1109/TETCI.2021.3066999
    [68]
    J. Schulman, S. Levine, P. Moritz, M. Jordan, and P. Abbeel, “Trust region policy optimization,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1889–1897.
    [69]
    G. Campos, N. H. El-Farra, and A. Palazoglu, “Soft actor-critic deep reinforcement learning with hybrid mixed-integer actions for demand responsive scheduling of energy systems,” Ind. Eng. Chem. Res., vol. 61, no. 24, pp. 8443–8461, Apr. 2022. doi: 10.1021/acs.iecr.1c04984
    [70]
    X. Yuan, Y. Wang, R. Zhang, Q. Gao, Z. Zhou, R. Zhou, and F. Yin, “Reinforcement learning control of hydraulic servo system based on TD3 algorithm,” Machines, vol. 10, no. 12, p. 1244, Dec. 2022. doi: 10.3390/machines10121244
    [71]
    D. Dutta and S. R. Upreti, “A survey and comparative evaluation of actor-critic methods in process control,” Can. J. Chem. Eng., vol. 100, no. 9, pp. 2028–2056, Sept. 2022. doi: 10.1002/cjce.24508
    [72]
    J. Xie, O. Dogru, B. Huang, C. Godwaldt, and B. Willms, “Reinforcement learning for soft sensor design through autonomous cross-domain data selection,” Comput. Chem. Eng., vol. 173, p. 108209, May 2023. doi: 10.1016/j.compchemeng.2023.108209
    [73]
    E. Skordilis and R. Moghaddass, “A deep reinforcement learning approach for real-time sensor-driven decision making and predictive analytics,” Comput. Ind. Eng., vol. 147, p. 106600, Sept. 2020. doi: 10.1016/j.cie.2020.106600
    [74]
    H. C. Croll, K. Ikuma, S. K. Ong, and S. Sarkar, “Reinforcement learning applied to wastewater treatment process control optimization: Approaches, challenges, and path forward,” Critical Reviews in Environmental Science and Technology, vol. 53, no. 20, pp. 1775–1794, Mar. 2023. doi: 10.1080/10643389.2023.2183699
    [75]
    K. Lee, D. Isele, E. A. Theodorou, and S. Bae, “Spatiotemporal costmap inference for MPC via deep inverse reinforcement learning,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 3194–3201, Apr. 2022. doi: 10.1109/LRA.2022.3146635
    [76]
    M. Zanon and S. Gros, “Safe reinforcement learning using robust MPC,” IEEE Trans. Automat. Control, vol. 66, no. 8, pp. 3638–3652, Aug. 2021. doi: 10.1109/TAC.2020.3024161
    [77]
    O. Dogru, N. Wieczorek, K. Velswamy, F. Ibrahim, and B. Huang, “Online reinforcement learning for a continuous space system with experimental validation,” J. Process Control, vol. 104, pp. 86–100, Aug. 2021. doi: 10.1016/j.jprocont.2021.06.004
    [78]
    B. Zhao, D. Liu, and C. Luo, “Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 10, pp. 4330–4340, Oct. 2020. doi: 10.1109/TNNLS.2019.2954983
    [79]
    Y. Yifei and S. Lakshminarayanan, “Multi-agent reinforcement learning system for multiloop control of chemical processes,” in Proc. IEEE Int. Symp. Advanced Control of Industrial Processes, Vancouver, Canada, 2022, pp. 48–53.
    [80]
    L. Chen, F. Meng, and Y. Zhang, “MBRL-MC: An HVAC control approach via combining model-based deep reinforcement learning and model predictive control,” IEEE Internet Things J., vol. 9, no. 19, pp. 19160–19173, Oct. 2022. doi: 10.1109/JIOT.2022.3164023
    [81]
    H. Dong, X. Zhao, and H. Yang, “Reinforcement learning-based approximate optimal control for attitude reorientation under state constraints,” IEEE Trans. Control Syst. Technol., vol. 29, no. 4, pp. 1664–1673, Jul. 2021. doi: 10.1109/TCST.2020.3007401
    [82]
    D. G. McClement, N. P. Lawrence, J. U. Backström, P. D. Loewen, M. G. Forbes, and R. B. Gopaluni, “Meta-reinforcement learning for the tuning of PI controllers: An offline approach,” J. Process Control, vol. 118, pp. 139–152, Oct. 2022. doi: 10.1016/j.jprocont.2022.08.002
    [83]
    I. Carlucho, M. De Paula, and G. G. Acosta, “An adaptive deep reinforcement learning approach for MIMO PID control of mobile robots,” ISA Trans., vol. 102, pp. 280–294, Jul. 2020. doi: 10.1016/j.isatra.2020.02.017
    [84]
    O. Dogru, K. Velswamy, F. Ibrahim, Y. Wu, A. S. Sundaramoorthy, B. Huang, S. Xu, M. Nixon, and N. Bell, “Reinforcement learning approach to autonomous PID tuning,” Comput. Chem. Eng., vol. 161, p. 107760, May 2022. doi: 10.1016/j.compchemeng.2022.107760
    [85]
    M. Mehndiratta, E. Camci, and E. Kayacan, “Automated tuning of nonlinear model predictive controller by reinforcement learning,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Madrid, Spain, 2018, pp. 3016–3021.
    [86]
    B. Zarrouki, V. Klöes, N. Heppner, S. Schwan, R. Ritschel, and R. Voßwinkel, “Weights-varying MPC for autonomous vehicle guidance: A deep reinforcement learning approach,” in Proc. European Control Conf., Delft, Netherlands, 2021, pp. 119–125.
    [87]
    E. Bøhn, S. Gros, S. Moe, and T. A. Johansen, “Reinforcement learning of the prediction horizon in model predictive control,” IFAC-PapersOnLine, vol. 54, no. 6, pp. 314–320, Feb. 2021. doi: 10.1016/j.ifacol.2021.08.563
    [88]
    W. H. Ray, Advanced Process Control. New York: McGraw-Hill, 1981.
    [89]
    R. Padhi, S. N. Balakrishnan, and T. Randolph, “Adaptive-critic based optimal neuro control synthesis for distributed parameter systems,” Automatica, vol. 37, no. 8, pp. 1223–1234, Aug. 2001. doi: 10.1016/S0005-1098(01)00093-0
    [90]
    M. Kumar, K. Rajagopal, S. N. Balakrishnan, and N. T. Nguyen, “Reinforcement learning based controller synthesis for flexible aircraft wings,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 4, pp. 435–448, Oct. 2014. doi: 10.1109/JAS.2014.7004670
    [91]
    B. Luo, H.-N. Wu, and H.-X. Li, “Data-based suboptimal neuro-control design with reinforcement learning for dissipative spatially distributed processes,” Ind. Eng. Chem. Res., vol. 53, no. 19, pp. 8106–8119, Apr. 2014. doi: 10.1021/ie4031743
    [92]
    B. Luo and H.-N. Wu, “Approximate optimal control design for nonlinear one-dimensional parabolic PDE systems using empirical eigenfunctions and neural network,” IEEE Trans. Syst.,Man,Cybern.,Part B (Cybern.), vol. 42, no. 6, pp. 1538–1549, Dec. 2012. doi: 10.1109/TSMCB.2012.2194781
    [93]
    B. Luo, T. Huang, H.-N. Wu, and X. Yang, “Data-driven H control for nonlinear distributed parameter systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 11, pp. 2949–2961, Nov. 2015. doi: 10.1109/TNNLS.2015.2461023
    [94]
    S. Peitz, J. Stenner, V. Chidananda, O. Wallscheid, S. L. Brunton, and K. Taira, “Distributed control of partial differential equations using convolutional reinforcement learning,” arXiv preprint arXiv: 2301.10737, 2023.
    [95]
    H. Ren, J. Dai, H. Zhang, and K. Zhang, “Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems,” Trans. Inst. Meas. Control, vol. 42, no. 15, pp. 2919–2928, Jul. 2020. doi: 10.1177/0142331220932634
    [96]
    H. Yu, S. Park, A. Bayen, S. Moura, and M. Krstic, “Reinforcement learning versus PDE backstepping and PI control for congested freeway traffic,” IEEE Trans. Control Syst. Technol., vol. 30, no. 4, pp. 1595–1611, Jul. 2022. doi: 10.1109/TCST.2021.3116796
    [97]
    Z. Wang, H.-X. Li, and C. Chen, “Reinforcement learning-based optimal sensor placement for spatiotemporal modeling,” IEEE Trans. Cybern., vol. 50, no. 6, pp. 2861–2871, Jun. 2020. doi: 10.1109/TCYB.2019.2901897
    [98]
    E. N. Evans, M. A. Pereira, G. I. Boutselis, and E. A. Theodorou, “Variational optimization based reinforcement learning for infinite dimensional stochastic systems,” in Proc. 3rd Annu. Conf. Robot Learning, Osaka, Japan, 2019, pp. 1231–1246.
    [99]
    S. Fan, X. Zhang, and Z. Song, “Imbalanced sample selection with deep reinforcement learning for fault diagnosis,” IEEE Trans. Ind. Inf., vol. 18, no. 4, pp. 2518–2527, Apr. 2022. doi: 10.1109/TII.2021.3100284
    [100]
    Y. Zhu, X. Liang, T. Wang, J. Xie, and J. Yang, “Multi-information fusion fault diagnosis of bogie bearing under small samples via unsupervised representation alignment deep Q-learning,” IEEE Trans. Instrum. Meas., vol. 72, p. 3503315, 2023.
    [101]
    G. Xu, M. Liu, Z. Jiang, W. Shen, and C. Huang, “Online fault diagnosis method based on transfer convolutional neural networks,” IEEE Trans. Instrum. Meas., vol. 69, no. 2, pp. 509–520, Feb. 2020. doi: 10.1109/TIM.2019.2902003
    [102]
    Y. Zhao, T. Li, X. Zhang, and C. Zhang, “Artificial intelligence-based fault detection and diagnosis methods for building energy systems: Advantages, challenges and the future,” Renew. Sustain. Energy Rev., vol. 109, pp. 85–101, Jul. 2019. doi: 10.1016/j.rser.2019.04.021
    [103]
    J. Zhou, L. Zheng, Y. Wang, C. Wang, and R. X. Gao, “Automated model generation for machinery fault diagnosis based on reinforcement learning and neural architecture search,” IEEE Trans. Instrum. Meas., vol. 71, p. 3501512, Jan. 2022.
    [104]
    F. Lv, C. Wen, and M. Liu, “Representation learning based adaptive multimode process monitoring,” Chemom. Intell. Lab. Syst., vol. 181, pp. 95–104, Oct. 2018. doi: 10.1016/j.chemolab.2018.07.011
    [105]
    W. Zhao, H. Liu, and F. L. Lewis, “Data-driven fault-tolerant control for attitude synchronization of nonlinear quadrotors,” IEEE Trans. Autom. Control, vol. 66, no. 11, pp. 5584–5591, Nov. 2021. doi: 10.1109/TAC.2021.3053194
    [106]
    H. Li, Y. Wu, and M. Chen, “Adaptive fault-tolerant tracking control for discrete-time multiagent systems via reinforcement learning algorithm,” IEEE Trans. Cybern., vol. 51, no. 3, pp. 1163–1174, Mar. 2020.
    [107]
    L. Liu, Z. Wang, and H. Zhang, “Data-based adaptive fault estimation and fault-tolerant control for MIMO model-free systems using generalized fuzzy hyperbolic model,” IEEE Trans. Fuzzy Syst., vol. 26, no. 6, pp. 3191–3205, Dec. 2018. doi: 10.1109/TFUZZ.2017.2717801
    [108]
    K. Zhao and J. Chen, “Adaptive neural quantized control of MIMO nonlinear systems under actuation faults and time-varying output constraints,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 9, pp. 3471–3481, Sept. 2020. doi: 10.1109/TNNLS.2019.2944690
    [109]
    Y. Du, B. Jiang, Y. Ma, and Y. Cheng, “Robust ADP-based slidingmode fault-tolerant control for nonlinear systems with application to spacecraft,” Appl. Sci., vol. 12, no. 3, p. 1673, Feb. 2022. doi: 10.3390/app12031673
    [110]
    I. A. Zamfirache, R.-E. Precup, R.-C. Roman, and E. M. Petriu, “Reinforcement learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system,” Inf. Sci., vol. 583, pp. 99–120, Jan. 2022. doi: 10.1016/j.ins.2021.10.070
    [111]
    D. P. Zhang and Z. W. Gao, “Reinforcement learning-based fault-tolerant control with application to flux cored wire system,” Meas. Control, vol. 51, no. 7–8, pp. 349–359, Jul. 2018. doi: 10.1177/0020294018789202
    [112]
    Q. Chen, Y. Jin, and Y. Song, “Fault-tolerant adaptive tracking control of Euler-Lagrange systems—An echo state network approach driven by reinforcement learning,” Neurocomputing, vol. 484, pp. 109–116, May 2022. doi: 10.1016/j.neucom.2021.10.083
    [113]
    N. V. Sahinidis, “Mixed-integer nonlinear programming 2018,” Optim. Eng., vol. 20, no. 2, pp. 301–306, Apr. 2019. doi: 10.1007/s11081-019-09438-1
    [114]
    A. P. Barbosa-Póvoa, “Process supply chains management—Where are we? Where to go next?” Front. Energy Res., vol. 2, p. 23, Jun. 2014.
    [115]
    K. Alhazmi, F. Albalawi, and S. M. Sarathy, “A reinforcement learning-based economic model predictive control framework for autonomous operation of chemical reactors,” Chem. Eng. J., vol. 428, p. 130993, Jan. 2022. doi: 10.1016/j.cej.2021.130993
    [116]
    B. Zhang, W. Hu, D. Cao, Q. Huang, Z. Chen, and F. Blaabjerg, “Deep reinforcement learning-based approach for optimizing energy conversion in integrated electrical and heating system with renewable energy,” Energy Convers. Manage., vol. 202, p. 112199, Dec. 2019. doi: 10.1016/j.enconman.2019.112199
    [117]
    D.-H. Oh, D. Adams, N. D. Vo, D. Q. Gbadago, C.-H. Lee, and M. Oh, “Actor-critic reinforcement learning to estimate the optimal operating conditions of the hydrocracking process,” Comput. Chem. Eng., vol. 149, p. 107280, Jun. 2021. doi: 10.1016/j.compchemeng.2021.107280
    [118]
    H. Shafi, K. Velswamy, F. Ibrahim, and B. Huang, “A hierarchical constrained reinforcement learning for optimization of bitumen recovery rate in a primary separation vessel,” Comput. Chem. Eng., vol. 140, p. 106939, Sept. 2020. doi: 10.1016/j.compchemeng.2020.106939
    [119]
    T. A. Mendiola-Rodriguez and L. A. Ricardez-Sandoval, “Robust control for anaerobic digestion systems of tequila vinasses under uncertainty: A deep deterministic policy gradient algorithm,” Digit. Chem. Eng., vol. 3, p. 100023, Jun. 2022. doi: 10.1016/j.dche.2022.100023
    [120]
    B. K. M. Powell, D. Machalek, and T. Quah, “Real-time optimization using reinforcement learning,” Comput. Chem. Eng., vol. 143, p. 107077, Dec. 2020. doi: 10.1016/j.compchemeng.2020.107077
    [121]
    S. Nikita, A. Tiwari, D. Sonawat, H. Kodamana, and A. S. Rathore, “Reinforcement learning based optimization of process chromatography for continuous processing of biopharmaceuticals,” Chem. Eng. Sci., vol. 230, p. 116171, Feb. 2021. doi: 10.1016/j.ces.2020.116171
    [122]
    M. Konishi, M. Inubushi, and S. Goto, “Fluid mixing optimization with reinforcement learning,” Sci. Rep., vol. 12, no. 1, p. 14268, Aug. 2022. doi: 10.1038/s41598-022-18037-7
    [123]
    F. Hourfar, H. J. Bidgoly, B. Moshiri, K. Salahshoor, and A. Elkamel, “A reinforcement learning approach for waterflooding optimization in petroleum reservoirs,” Eng. Appl. Artif. Intell., vol. 77, pp. 98–116, Jan. 2019. doi: 10.1016/j.engappai.2018.09.019
    [124]
    A. Tewari, K.-H. Liu, and D. Papageorgiou, “Information-theoretic sensor planning for large-scale production surveillance via deep reinforcement learning,” Comput. Chem. Eng., vol. 141, p. 106988, Oct. 2020. doi: 10.1016/j.compchemeng.2020.106988
    [125]
    C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann, and J. M. Wassick, “A deep reinforcement learning approach for chemical production scheduling,” Comput. Chem. Eng., vol. 141, p. 106982, Oct. 2020. doi: 10.1016/j.compchemeng.2020.106982
    [126]
    X. Wang, Y. Qian, H. Gao, C. W. Coley, Y. Mo, R. Barzilay, and K. F. Jensen, “Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning,” Chem. Sci., vol. 11, no. 40, pp. 10959–10972, Sept. 2020. doi: 10.1039/D0SC04184J
    [127]
    K. Khetarpal, M. Riemer, I. Rish, and D. Precup, “Towards continual reinforcement learning: A review and perspectives,” J. Artif. Intell. Res., vol. 75, pp. 1401–1476, Dec. 2022. doi: 10.1613/jair.1.13673
    [128]
    Z. Yuan, A. W. Hall, S. Zhou, L. Brunke, M. Greeff, J. Panerati, and A. P. Schoellig, “Safe-control-gym: A unified benchmark suite for safe learning-based control and reinforcement learning in robotics,” IEEE Robot. Autom. Lett., vol. 7, no. 4, pp. 11142–11149, Oct. 2022. doi: 10.1109/LRA.2022.3196132
    [129]
    Y. Jiang, W. Gao, J. Wu, T. Chai, and F. L. Lewis, “Reinforcement learning and cooperative H output regulation of linear continuous-time multi-agent systems,” Automatica, vol. 148, p. 110768, Feb. 2023. doi: 10.1016/j.automatica.2022.110768
    [130]
    F.-Y. Wang, Q. Miao, X. Li, X. Wang, and Y. Lin, “What does ChatGPT say: The DAO from algorithmic intelligence to linguistic intelligence,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 575–579, Mar. 2023. doi: 10.1109/JAS.2023.123486
    [131]
    M. Tokic and H. B. Ammar, “Teaching reinforcement learning using a physical robot,” in Proc. Workshop on Teaching Machine Learning at the 29th Int. Conf. Machine Learning, 2012.
    [132]
    H. Song, D. Ding, H. Dong, and X. Yi, “Distributed filtering based on Cauchy-kernel-based maximum correntropy subject to randomly occurring cyber-attacks,” Automatica, vol. 135, p. 110004, Jan. 2022. doi: 10.1016/j.automatica.2021.110004
    [133]
    Y. Huang, L. Huang, and Q. Zhu, “Reinforcement learning for feedback-enabled cyber resilience,” Annu. Rev. Control, vol. 53, pp. 273–295, Jul. 2022. doi: 10.1016/j.arcontrol.2022.01.001
    [134]
    M. Xie, D. Ding, X. Ge, Q.-L. Han, H. Dong, and Y. Song, “Distributed platooning control of automated vehicles subject to replay attacks based on proportional integral observers,” IEEE/CAA J. Autom. Sinica, 2022. DOI: 10.1109/JAS.2022.105941
    [135]
    F. Wei, Z. Wan, and H. He, “Cyber-attack recovery strategy for smart grid based on deep reinforcement learning,” IEEE Trans. Smart Grid, vol. 11, no. 3, pp. 2476–2486, May 2020. doi: 10.1109/TSG.2019.2956161
    [136]
    M. N. Kurt, O. Ogundijo, C. Li, and X. Wang, “Online cyber-attack detection in smart grid: A reinforcement learning approach,” IEEE Trans. Smart Grid, vol. 10, no. 5, pp. 5174–5185, Sept. 2019. doi: 10.1109/TSG.2018.2878570
    [137]
    I. Ortega-Fernandez and F. Liberati, “A review of denial of service attack and mitigation in the smart grid using reinforcement learning,” Energies, vol. 16, no. 2, p. 635, Jan. 2023. doi: 10.3390/en16020635
    [138]
    S. Parker, Z. Wu, and P. D. Christofides, “Cybersecurity in process control, operations, and supply chain,” Comput. Chem. Eng., vol. 171, p. 108169, Mar. 2023. doi: 10.1016/j.compchemeng.2023.108169

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(3)

    Article Metrics

    Article views (523) PDF downloads(158) Cited by()

    Highlights

    • Reviews the link between modern reinforcement learning techniques and process industries by considering the control hierarchy holistically
    • Presents the state-of-the-art theoretical advancements in the theory while presenting the relevant applications in numerous industries
    • Discusses limitations, advantages, trends, new applications, opportunities, and future prospects that can help the researchers and practitioners

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return