Learning Laws for Deep Convolutional Neural Networks With Guaranteed Convergence

Sitan Li; Chien Chern CHEAH

doi:10.1109/JAS.2025.125171

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > In Press, Accepted Manuscript

S. LI and C. CHEAH, “Learning laws for deep convolutional neural networks with guaranteed convergence,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125171

Citation:

S. LI and C. CHEAH, “Learning laws for deep convolutional neural networks with guaranteed convergence,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125171

S. LI and C. CHEAH, “Learning laws for deep convolutional neural networks with guaranteed convergence,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125171

Citation:

S. LI and C. CHEAH, “Learning laws for deep convolutional neural networks with guaranteed convergence,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2025.125171

PDF( 1528 KB)

Learning Laws for Deep Convolutional Neural Networks With Guaranteed Convergence

doi: 10.1109/JAS.2025.125171

Sitan Li^{,
,},
Chien Chern CHEAH^,

Funds: This work was supported by the Ministry of Education (MOE) Singapore, Academic Research Fund (AcRF) Tier 1, (RG65/22)

More Information

Abstract

Abstract

Convolutional Neural Networks (CNNs) have shown remarkable success across numerous tasks such as image classification, yet the theoretical understanding of their convergence remains underdeveloped compared to their empirical achievements. In this paper, the first filter learning framework with convergence-guaranteed learning laws for End-To-End learning of deep CNNs is proposed. Novel update laws with convergence analysis are formulated based on the mathematical representation of each layer in convolutional neural networks. The proposed learning laws enable concurrent updates of weights across all layers of the deep convolutional neural network and the analysis shows that the training errors converge to certain bounds which are dependent on the approximation errors. Case studies are conducted on benchmark datasets and the results show that the proposed concurrent filter learning framework guarantees the convergence and offers more consistent and reliable results during training with a trade-off in performance compared to stochastic gradient descent methods. This framework represents a significant step towards enhancing the reliability and effectiveness of deep convolutional neural network by developing a theoretical analysis which allows practical implementation of the learning laws with automatic tuning of the learning rate to guarantee the convergence during training.
- Convergence,
- Convolution neural networks

FullText(HTML)

References(37)

References

[1]	Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in Proc. NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011, pp. 4.
[2]	H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv: 1708.07747, 2017.
[3]	T. Clanuwat, M. Bober-Irizar, A. Kitamoto, A. Lamb, K. Yamamoto, and D. Ha, “Deep learning for classical Japanese literature,” arXiv preprint arXiv: 1812.01718, 2018.
[4]	A. Krizhevsky, V. Nair, and G. Hinton, “CIFAR-10 (Canadian institute for advanced research),” [Online]. Available: http://www.cs.toronto.edu/~kriz/cifar.html
[5]	S. R. Fanello, C. Ciliberto, M. Santoro, L. Natale, G. Metta, L. Rosasco, and F. Odone, “iCub world: Friendly robots help building good vision data-sets,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Portland, USA, 2013, pp. 700–705.
[6]	Z. A. Saberi, H. Sadr, and M. R. Yamaghani, “An intelligent diagnosis system for predicting coronary heart disease,” in Proc. 10th Int. Conf. Artificial Intelligence and Robotics, Qazvin, Iran, Islamic Republic of, 2024, pp. 131–137.
[7]	D. Gunning, M. Stefik, J. Choi, T. Miller, S. Stumpf, and G.-Z. Yang, “XAI-explainable artificial intelligence,” Sci. Robot., vol. 4, no. 37, p. eaay7120, Dec. 2019. doi: 10.1126/scirobotics.aay7120
[8]	S. Thrun and T. M. Mitchell, “Lifelong robot learning,” Rob. Auton. Syst., vol. 15, no. 1-2, pp. 25–46, Jul. 1995. doi: 10.1016/0921-8890(95)00004-Y
[9]	B. Wu, J. Zhong, and C. Yang, “A visual-based gesture prediction framework applied in social robots,” IEEE/CAA J. Autom. Sinica, vol. 9, pp. 510–519, 2022. doi: 10.1109/JAS.2021.1004243
[10]	A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J. M. Allen, V. D. Lam, A. Bewley, and A. Shah, “Learning to drive in a day,” in Proc. Int. Conf. Robotics and Automation, Montreal, Canada, 2019, pp. 8248–8254.
[11]	P. M. Kebria, A. Khosravi, S. M. Salaken, and S. Nahavandi, “Deep imitation learning for autonomous vehicles based on convolutional neural networks,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 82–95, Jan. 2020. doi: 10.1109/JAS.2019.1911825
[12]	Z. Zhao, J. Zhang, S. Chen, W. He, and K.-S. Hong, “Neural-network-based adaptive finite-time control for a two-degree-of-freedom helicopter system with an event-triggering mechanism,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 8, pp. 1754–1765, Aug. 2023. doi: 10.1109/JAS.2023.123453
[13]	Z. Chen and B. Liu, Lifelong Machine Learning. 2nd ed. Cham, Germany: Springer, 2018.
[14]	D. Zou, Y. Cao, D. Zhou, and Q. Gu, “Gradient descent optimizes over-parameterized deep ReLU networks,” Mach. Learn., vol. 109, no. 3, pp. 467–492, Mar. 2020. doi: 10.1007/s10994-019-05839-6
[15]	S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai, “Gradient descent finds global minima of deep neural networks,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 1675–1685.
[16]	Z. Allen-Zhu, Y. Li, and Z. Song, “A convergence theory for deep learning via over-parameterization,” in Proc. 36th Int. Conf. Machine Learning, Long Beach, USA, 2019, pp. 242–252.
[17]	H. T. Nguyen, C. C. Cheah, and K. A. Toh, “An analytic layer-wise deep learning framework with applications to robotics,” Automatica, vol. 135, p. 110007, Jan. 2022. doi: 10.1016/j.automatica.2021.110007
[18]	S. Li, H. T. Nguyen, and C. C. Cheah, “A theoretical framework for end-to-end learning of deep neural networks with applications to robotics,” IEEE Access, vol. 11, pp. 21992–22006, Feb. 2023. doi: 10.1109/ACCESS.2023.3249280
[19]	S. Li and C. C. Cheah, “An analytic end-to-end collaborative deep learning algorithm,” IEEE Control Syst. Lett., vol. 7, pp. 3024–3029, Jul. 2023. doi: 10.1109/LCSYS.2023.3292034
[20]	A. Guarneros-Sandoval, M. Ballesteros, I. Salgado, J. Rodríguez-Santillán, and I. Chairez, “Lyapunov stable learning laws for multilayer recurrent neural networks,” Neurocomputing, vol. 491, pp. 644–657, Jun. 2022. doi: 10.1016/j.neucom.2021.12.041
[21]	O. S. Patil, D. M. Le, M. L. Greene, and W. E. Dixon, “Lyapunov-derived control and adaptive update laws for inner and outer layer weights of a deep neural network,” IEEE Control Syst. Lett., vol. 6, pp. 1855–1860, Dec. 2022. doi: 10.1109/LCSYS.2021.3134914
[22]	O. S. Patil, D. M. Le, E. J. Griffis, and W. E. Dixon, “Deep residual neural network (ResNet)-based adaptive control: A lyapunov-based approach,” in Proc. 61st IEEE Conf. Decision and Control, Cancun, Mexico, 2022, pp. 3487–3492.
[23]	D. Zou and Q. Gu, “An improved analysis of training over-parameterized deep neural networks,” in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 184.
[24]	S. Bombari, M. H. Amani, and M. Mondelli, “Memorization and optimization in deep neural networks with minimum over-parameterization,” in Proc. 36th Int. Conf. Neural Information Processing System, New Orleans, USA, 2022, pp. 554.
[25]	M. Kohler and S. Langer, “Statistical theory for image classification using deep convolutional neural network with cross-entropy loss under the hierarchical max-pooling model,” J. Stat. Plan. Inference, vol. 234, p. 106188, Jan. 2025. doi: 10.1016/j.jspi.2024.106188
[26]	Z. Fang and G. Cheng, “Optimal convergence rates of deep convolutional neural networks: Additive ridge functions,” Trans. Mach. Learn. Res., 2023.
[27]	M. Kohler and B. Walter, “Analysis of convolutional neural network image classifiers in a rotationally symmetric model,” IEEE Trans. Inf. Theory, vol. 69, no. 8, pp. 5203–5218, Aug. 2023. doi: 10.1109/TIT.2023.3262745
[28]	H. Zhang, L. Feng, X. Zhang, Y. Yang, and J. Li, “Necessary conditions for convergence of CNNs and initialization of convolution kernels,” Digital Signal Process., vol. 123, p. 103397, Apr. 2022. doi: 10.1016/j.dsp.2022.103397
[29]	Y. Xu and H. Zhang, “Convergence of deep convolutional neural networks,” Neural Netw., vol. 153, pp. 553–563, Sept. 2022. doi: 10.1016/j.neunet.2022.06.031
[30]	S. Zhang, M. Wang, J. Xiong, S. Liu, and P.-Y. Chen, “Improved linear convergence of training CNNs with generalizability guarantees: A one-hidden-layer case,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 6, pp. 2622–2635, Jun. 2021. doi: 10.1109/TNNLS.2020.3007399
[31]	H. T. Nguyen, S. Li, and C. C. Cheah, “A layer-wise theoretical framework for deep learning of convolutional neural networks,” IEEE Access, vol. 10, pp. 14270–14287, Jan. 2022. doi: 10.1109/ACCESS.2022.3147869
[32]	Y. Le Cun, B. Boser, J. S. Denker, R. Howard, W. Hubbard, L. D. Jackel, and D. Henderson, “Handwritten digit recognition with a back-propagation network,” in Proc. 3rd Int. Conf. Neural Information Processing Systems, Denver, USA, 1990, pp. 396–404.
[33]	K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd Int. Conf. Learning Representations, San Diego, USA, 2015, pp. 1–14.
[34]	D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. 3rd Int. Conf. Learning Representations, San Diego, USA, 2015.
[35]	N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural Netw., vol. 12, no. 1, pp. 145–151, Jan. 1999. doi: 10.1016/S0893-6080(98)00116-6
[36]	K. He and J. Sun, “Convolutional neural networks at constrained time cost,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 5353–5360.
[37]	T. L. Hayes, N. D. Cahill, and C. Kanan, “Memory efficient experience replay for streaming learning,” in Proc. Int. Conf. Robotics and Automation, Montreal, Canada, 2019, pp. 9769–9776.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(5) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views (28) PDF downloads(9)

Learning Laws for Deep Convolutional Neural Networks With Guaranteed Convergence

doi: 10.1109/JAS.2025.125171

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content