Two-Dimensional Model-Free Off-Policy Optimal Iterative Learning Control for Time-Varying Batch Systems

Jianan Liu; Zike Zhou; Jinglin Huang; Wenjing Hong; Jia Shi

doi:10.1109/JAS.2025.125399

Volume 13 Issue 3

Mar. 2026

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 19.2, Top 1 (SCI Q1)

CiteScore: 28.2, Top 1% (Q1)
Google Scholar h5-index: 95， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2026 > 13(3): 692-703

J. Liu, Z. Zhou, J. Huang, W. Hong, and J. Shi, “Two-dimensional model-free off-policy optimal iterative learning control for time-varying batch systems,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 3, pp. 692–703, Mar. 2026. doi: 10.1109/JAS.2025.125399

Citation:

J. Liu, Z. Zhou, J. Huang, W. Hong, and J. Shi, “Two-dimensional model-free off-policy optimal iterative learning control for time-varying batch systems,” IEEE/CAA J. Autom. Sinica, vol. 13, no. 3, pp. 692–703, Mar. 2026. doi: 10.1109/JAS.2025.125399

Citation:

PDF( 1297 KB)

Two-Dimensional Model-Free Off-Policy Optimal Iterative Learning Control for Time-Varying Batch Systems

doi: 10.1109/JAS.2025.125399

More Information

Author Bio:
Jianan Liu received the B.S. degree in mechanical and electrical engineering from Xidian University in 2015. He received the M.S. degree in mechatronics and information technology from Karlsruhe Institute of Technology, Karlsruhe, Germany, in 2020. He is currently pursuing the Ph.D. degree at the Institute of Artificial Intelligence, Xiamen University. His current research interests include iterative learning control, reinforcement learning, and intelligent control

Zike Zhou received the B.S. degree in mechanical engineering from Southwest Jiaotong University and the Leeds Joint School in Chengdu. She is currently undertaking a master’s program at the Institute of Artificial Intelligence, Xiamen University. Her research interests include sliding mode control, reinforcement learning, and the control of robotic manipulators

Jinglin Huang received the B.S. degree in chemical engineering and technology from Xiamen University in 2022. She is currently pursuing the M.S. degree in the Department of Chemical and Biochemical Engineering, Xiamen University. Her current research interests include model predictive control, reinforcement learning, and intelligent control

Wenjing Hong is a Full Professor and Vice Dean at the College of Chemistry and Chemical Engineering and a Professor at the College of Materials and Institute of Artificial Intelligence of Xiamen University. He received the B.Sc. degree from Xiamen University in 2007, the M.Sc. degree from Tsinghua University in 2009, and the Ph.D. of science degree (summa cum laude) from the University of Bern, Bern, Switzerland in 2013. After his postdoctoral stay at the University of Bern, he joined Xiamen University as a Full Professor and Group Leader in 2015. He also serves as the Senior Editor of Langmuir. His current research interests include artificial intelligence for science (AI4S), including the fundamental science of single-molecule electronics and the applied science in new materials and energy storage systems

Jia Shi received the M.Sc. degree in operational research and cybernetics from Xiamen University in 1997 and received the Ph.D. degree in control science and engineering from Zhejiang University in 2006. From 2003 to 2006, he worked as a Research Assistant in the Department of Chemical Engineering, the Hong Kong University of Science and Technology, Hong Kong, China. Since 2008, he has been an Associate Professor in the Department of Chemical and Biochemical Engineering, Xiamen University. His current research interests include intelligent learning control and optimization for complex industrial processes, including iterative learning control, reinforcement learning, and composite intelligent control combined with various advanced control techniques
Corresponding author: Jia Shi, e-mail: jshi@xmu.edu.cn
Received Date: 2024-12-25
Accepted Date: 2025-02-28

Available Online: 2025-07-07

Abstract

Abstract

Although iterative learning control (ILC) has been widely used in batch processes, designing an optimal iterative learning control scheme for batch systems with unknown dynamics and time-varying parameters remains an open problem. In this paper, we propose a novel two-dimensional model-free off-policy optimal iterative learning control to achieve optimal control performance for linear time-varying batch systems. First, the one-dimensional state space is expanded to the two-dimensional state space by integrating time and batch information. Then, based on dynamic programming and a recursive algorithm, the framework of two-dimensional model-based optimal iterative learning control is established. Based on this framework, two-dimensional model-free optimal iterative learning control is further developed using model-free Q-learning reinforcement learning. The optimal iterative learning control policy is obtained through online off-policy iteration using historical and online operation data. Meanwhile, a rigorous convergence proof of the model-free optimal iterative learning control law is presented. Finally, the simulation results in the injection molding batch process demonstrate the proposed control scheme’s effectiveness, feasibility, and significant improvement in control performance.
- Linear time-varying batch processes,
- model-free,
- off-policy Q-learning,
- optimal iterative learning control,
- two-dimensional batch systems

FullText(HTML)

References(34)

References

[1]	S. Arimoto, S. Kawamura, and F. Miyazaki, “Bettering operation of robots by learning,” J. Robot. Syst., vol. 1, no. 2, pp. 123–140, 1984. doi: 10.1002/rob.4620010203
[2]	Y. Chen and C. T. Freeman, “Iterative learning control for piecewise arc path tracking with validation on a gantry robot manufacturing platform,” ISA Trans., vol. 139, pp. 650–659, 2023. doi: 10.1016/j.isatra.2023.03.046
[3]	M. Pierallini, F. Angelini, R. Mengacci, A. Palleschi, A. Bicchi, and M. Garabini, “Iterative learning control for compliant underactuated arms,” IEEE Trans. Syst. Man Cybern. Syst., vol. 53, no. 6, pp. 3810−3822, 2023.
[4]	K. Xu, B. Meng, and Z. Wang, “Design of data-driven mode-free iterative learning controller based higher order parameter estimation for multi-agent systems consistency tracking,” Knowledge-Based Syst., vol. 261, p. 110221, 2023. doi: 10.1016/j.knosys.2022.110221
[5]	D. Shen and J.-X. Xu, “Distributed learning consensus for heterogenous high-order nonlinear multi-agent systems with output constraints,” Automatica, vol. 97, pp. 64–72, 2018. doi: 10.1016/j.automatica.2018.07.030
[6]	I. Lim, D. J. Hoelzle, and K. L. Barton, “A multi-objective iterative learning control approach for additive manufacturing applications,” Control Eng. Practice, vol. 64, pp. 74–87, 2017. doi: 10.1016/j.conengprac.2017.03.011
[7]	Z. Afkhami, D. J. Hoelzle, and K. Barton, “Robust higher-order spatial iterative learning control for additive manufacturing systems,” IEEE Trans. Control Syst. Technol., vol. 31, no. 4, pp. 1692−1707, 2023.
[8]	B. Shibani, P. Ambure, A. Purohit, P. Suratia, and S. Bhartiya, “Control of batch pulping process using data-driven constrained iterative learning control,” Comput. Chem. Eng., vol. 170, p. 108138, 2023. doi: 10.1016/j.compchemeng.2023.108138
[9]	J. Liu, W. Hong, and J. Shi, “Two dimensional (2d) feedback control scheme based on deep reinforcement learning algorithm for nonlinear non-repetitive batch processes,” in Proc. 11th Data Driven Control and Learning Systems Conf., Chengdu, China: IEEE, 2022, pp. 262–267.
[10]	N. Liu and A. Alleyne, “Iterative learning identification for linear time-varying: systems,” IEEE Trans. Control Syst. Technol., vol. 24, no. 1, pp. 310–317, 2015.
[11]	J. Wei, H. Tao, S. Hao, W. Paszke, and K. Gałkowski, “Output feedback based robust iterative learning control via a heuristic approach for batch processes with time-varying state delays and uncertainties,” J. Process Control, vol. 116, pp. 159–171, 2022. doi: 10.1016/j.jprocont.2022.06.008
[12]	B. Chu, A. Rauh, H. Aschemann, E. Rogers, and D. H. Owens, “Constrained iterative learning control for linear time-varying systems with experimental validation on a high-speed rack feeder,” IEEE Trans. Control Syst. Technol., vol. 30, no. 5, pp. 1834–1846, 2021.
[13]	K. L. Barton and A. G. Alleyne, “A norm optimal approach to time-varying ILC with application to a multi-axis robotic testbed,” IEEE Trans. Control Syst. Technol., vol. 19, no. 1, pp. 166–180, 2010.
[14]	S. Hao, T. Liu, and F. Gao, “PI based indirect-type iterative learning control for batch processes with time-varying uncertainties: A 2D FM model based approach,” J. Process Control, vol. 78, pp. 57–67, 2019. doi: 10.1016/j.jprocont.2019.04.003
[15]	S. He, W. Chen, D. Li, Y. Xi, Y. Xu, and P. Zheng, “Iterative learning control with data-driven-based compensation,” IEEE T. Cybern., vol. 52, no. 8, pp. 7492–7503, 2021.
[16]	X. Yu, X. Fang, B. Mu, and T. Chen, “Kernel-based regularized iterative learning control of repetitive linear time-varying systems,” Automatica, vol. 154, p. 111047, 2023. doi: 10.1016/j.automatica.2023.111047
[17]	D. Meng and J. Zhang, “Design and analysis of data-driven learning control: An optimization-based approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5527–5541, 2021.
[18]	H. Shen, C. Peng, H. Yan, and S. Xu, “Data-driven near optimization for fast sampling singularly perturbed systems,” IEEE Trans. Autom. Control, vol. 69, no. 7, pp. 4689−4694, 2024.
[19]	J. Wang, J. Wu, H. Shen, J. Cao, and L. Rutkowski, “Fuzzy H_∞ control of discrete-time nonlinear Markov jump systems via a novel hybrid reinforcement Q-learning method,” IEEE T. Cybern., vol. 53, no. 11, pp. 7380–7391, 2022.
[20]	H. Zhang, S. Li, and Y. Zheng, “Q-learning-based model predictive control for nonlinear continuous-time systems,” Ind. Eng. Chem. Res., vol. 59, no. 40, pp. 17 987–17 999, 2020. doi: 10.1021/acs.iecr.0c02321
[21]	F. Guo, X. Zhou, J. Liu, Y. Zhang, D. Li, and H. Zhou, “A reinforcement learning decision model for online process parameters optimization from offline data in injection molding,” Appl. Soft. Comput., vol. 85, p. 105828, 2019. doi: 10.1016/j.asoc.2019.105828
[22]	Y. Ruan, Y. Zhang, T. Mao, X. Zhou, D. Li, and H. Zhou, “Trajectory optimization and positioning control for batch process using learning control,” Control Eng. Practice, vol. 85, pp. 1–10, 2019. doi: 10.1016/j.conengprac.2019.01.004
[23]	Y. Zhang, B. Chu, and Z. Shu, “Parameter optimal iterative learning control design: From model-based, data-driven to reinforcement learning,” IFAC-PapersOnLine, vol. 55, no. 12, pp. 494–499, 2022. doi: 10.1016/j.ifacol.2022.07.360
[24]	X. Wen, H. Shi, C. Su, X. Jiang, P. Li, and J. Yu, “Novel data-driven two-dimensional Q-learning for optimal tracking control of batch process with unknown dynamics,” ISA Trans., vol. 125, pp. 10–21, 2022. doi: 10.1016/j.isatra.2021.06.007
[25]	J. Liu, Z. Zhou, W. Hong, and J. Shi, “Two-dimensional iterative learning control with deep reinforcement learning compensation for the nonrepetitive uncertain batch processes,” J. Process Control, vol. 131, p. 103106, 2023. doi: 10.1016/j.jprocont.2023.103106
[26]	H. Shi, W. Gao, X. Jiang, C. Su, and P. Li, “Two-dimensional model-free Q-learning-based output feedback fault-tolerant control for batch processes,” Comput. Chem. Eng., vol. 182, p. 108583, 2024. doi: 10.1016/j.compchemeng.2024.108583
[27]	X. Jiang, M. Huang, H. Shi, X. Wang, and Y. Zhang, “Off-policy twodimensional reinforcement learning for optimal tracking control of batch processes with network-induced dropout and disturbances,” ISA Trans., vol. 144, pp. 228–244, 2024. doi: 10.1016/j.isatra.2023.11.011
[28]	H. Shi, C. Yang, X. Jiang, C. Su, and P. Li, “Novel two-dimensional off-policy: Q-learning method for output feedback optimal tracking control of batch process with unknown dynamics,” J. Process Control, vol. 113, pp. 29–41, 2022. doi: 10.1016/j.jprocont.2022.03.006
[29]	J. Shi, F. Gao, and T.-J. Wu, “Robust design of integrated feedback and iterative learning control of a batch process based on a 2D Roesser system,” J. Process Control, vol. 15, no. 8, pp. 907–924, 2005. doi: 10.1016/j.jprocont.2005.02.005
[30]	D. Bertsekas, Dynamic Programming and Optimal Control: Volume I. MIT: Athena Scientific, 2012.
[31]	J. Clifton and E. Laber, “Q-learning: Theory and applications,” Annu. Rev. Stat. Application, vol. 7, no. 1, pp. 279–301, 2020. doi: 10.1146/annurev-statistics-031219-041220
[32]	P. B. Stark and R. L. Parker, “Bounded-variable least-squares: An algorithm and applications,” Comput. Stat., vol. 10, pp. 129–129, 1995.
[33]	Y. Wang, T. Liu, and Z. Zhao, “Advanced PI control with simple learning set-point design: Application on batch processes and robust stability analysis,” Chem. Eng. Sci., vol. 71, pp. 153–165, 2012. doi: 10.1016/j.ces.2011.12.028
[34]	T. Liu, X. Z. Wang, and J. Chen, “Robust PID based indirect-type iterative learning control for batch processes with time-varying uncertainties,” J. Process Control, vol. 24, no. 12, pp. 95–106, 2014. doi: 10.1016/j.jprocont.2014.07.002

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(8) / Tables(1)

Get Citation

PDF

XML

Article Metrics

Article views (866) PDF downloads(66)

Two-Dimensional Model-Free Off-Policy Optimal Iterative Learning Control for Time-Varying Batch Systems

doi: 10.1109/JAS.2025.125399

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content