Evolution and Role of Optimizers in Training Deep Learning Models

XiaoHao Wen; MengChu Zhou

doi:10.1109/JAS.2024.124806

Volume 11 Issue 10

Oct. 2024

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2024 > 11(10): 2039-2042

X. H. Wen and M. C. Zhou, “Evolution and role of optimizers in training deep learning models,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 10, pp. 2039–2042, Oct. 2024. doi: 10.1109/JAS.2024.124806

Citation:

X. H. Wen and M. C. Zhou, “Evolution and role of optimizers in training deep learning models,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 10, pp. 2039–2042, Oct. 2024. doi: 10.1109/JAS.2024.124806

X. H. Wen and M. C. Zhou, “Evolution and role of optimizers in training deep learning models,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 10, pp. 2039–2042, Oct. 2024. doi: 10.1109/JAS.2024.124806

Citation:

PDF( 211 KB)

Evolution and Role of Optimizers in Training Deep Learning Models

doi: 10.1109/JAS.2024.124806

XiaoHao Wen^,,
MengChu Zhou^{,
,}

Funds: This work was partially supported by the Guangxi Universities and Colleges Young and Middle-aged Teachers’ Scientific Research Basic Ability Enhancement Project (2023KY0055)

More Information

Abstract

FullText(HTML)

References(33)

References

[1]	Z. Zhang et al., “Mapping network-coordinated stacked gated recurrent units for turbulence prediction,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 6, pp. 1331–1341, 2024. doi: 10.1109/JAS.2024.124335
[2]	H. Liu, et al., “Aspect-based sentiment analysis: A survey of deep learning methods,” IEEE Trans. Computational Social Systems, vol. 7, no. 6, pp. 1358–1375, Dec. 2020. doi: 10.1109/TCSS.2020.3033302
[3]	H. Wu et al., “A PID-incorporated latent factorization of tensors approach to dynamically weighted directed network analysis,” IEEE/ CAA J. Autom. Sinica, vol. 9, no. 3, pp. 533–546, 2022. doi: 10.1109/JAS.2021.1004308
[4]	W. Xu et al., “Transformer-based macroscopic regulation for highspeed railway timetable rescheduling,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1822–1833, 2023. doi: 10.1109/JAS.2023.123501
[5]	I. Goodfellow et al., Deep Learning. MIT press, 2016.
[6]	S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv: 1609.04747, 2016.
[7]	H. Robbins et al., “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.
[8]	J. Duchi et al., “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, p. 7, 2011.
[9]	T. Tieleman, “Lecture 6.5-RMSprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, vol. 4, p. 2, 2012.
[10]	D. P. Kingma et al., “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.
[11]	A. Vaswani et al., “Attention is all you need,” Advances in Neural Information Processing Systems, p. 30, 2017.
[12]	N. S. Keskar et al., “Improving generalization performance by switching from Adam to SGD,” arXiv preprint arXiv: 1712.07628, 2017.
[13]	L. Luo et al., “Adaptive gradient methods with dynamic bound of learning rate,” in Proc. Int. Conf. Learning Representations, 2019.
[14]	I. Loshchilov et al., “Decoupled weight decay regularization,” arXiv preprint arXiv: 1711.05101, 2017.
[15]	P. Foret et al., “Sharpness-aware minimization for efficiently improving generalization,” arXiv preprint arXiv: 2010.01412, 2020.
[16]	J. Kwon et al., “ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,” in Proc. Int. Conf. Machine Learning, 2021, pp. 5905–5914.
[17]	X. Xie et al., “Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models,” arXiv preprint arXiv: 2208.06677, 2022.
[18]	J. Zhuang et al., “Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,” in Proc. Conf. Neural Information Processing Systems, 2020.
[19]	H. Liu et al., “Sophia: A scalable stochastic second-order optimizer for language model pre-training,” arXiv preprint arXiv: 2305.14342, 2023.
[20]	J. Chen et al., “Hierarchical particle swarm optimization-incorporated latent factor analysis for large-scale incomplete matrices,” IEEE Trans. Big Data, vol. 8, no. 6, pp. 1524–1536, 2022.
[21]	M. Cui et al., “Surrogate-assisted autoencoder-embedded evolutionary optimization algorithm to solve high-dimensional expensive problems,” IEEE Trans. Evolutionary Computation, vol. 26, no. 4, pp. 676–689, 2022. doi: 10.1109/TEVC.2021.3113923
[22]	G. Wei et al., “A hybrid probabilistic multiobjective evolutionary algorithm for commercial recommendation systems,” IEEE Trans. Computational Social Systems, vol. 8, no. 3, pp. 589–598, 2021. doi: 10.1109/TCSS.2021.3055823
[23]	J. Bi et al., “Energy-optimized partial computation offloading in mobile-edge computing with genetic simulated-annealing-based particle swarm optimization,” IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3774–3785, 2021. doi: 10.1109/JIOT.2020.3024223
[24]	S. Gao et al., “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 2, pp. 601–614, 2018.
[25]	Y. Yu et al., “Improving dendritic neuron model with dynamic scalefree network-based differential evolution,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 99–110, 2021.
[26]	X. Luo et al., “Interpretability diversity for decision-tree-initialized dendritic neuron model ensemble,” IEEE Trans. Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2023.3290203, 2023.
[27]	Y. Yu et al., “Improving dendritic neuron model with dynamic scale-free network-based differential evolution,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 99–110, Jan. 2022. doi: 10.1109/JAS.2021.1004284
[28]	S. Gao et al., “Fully complex-valued dendritic neuron model,” IEEE Trans. Neural Networks and Learning Systems, vol. 34, no. 4, pp. 2105–2118, Apr. 2023. doi: 10.1109/TNNLS.2021.3105901
[29]	F.-Y. Wang, “Intelligent vehicles from your HomePorts to underwaters and low attitude airspaces: SLAM for smart societies,” IEEE Trans. Intelligent Vehicles, vol. 9, no. 2, pp. 3092–3105, Feb. 2024. doi: 10.1109/TIV.2024.3373614
[30]	G. Yuan et al., “An autonomous vehicle group cooperation model in an urban scene,” IEEE Trans. Intelligent Transportation Systems, vol. 24, no. 12, pp. 13852–13862, Dec. 2023. doi: 10.1109/TITS.2023.3300278
[31]	Q. Zhao et al., “A tutorial on Internet of behaviors: Concept, architecture, technology, applications, and challenges,” IEEE Communi cations Surveys & Tutorials, vol. 25, no. 2, pp. 1227–1260, Secondquarter 2023.
[32]	S. Lou et al., “Human-cyber-physical system for Industry 5.0: A review from a human-centric perspective,” IEEE Trans. Automation Science and Engineering, doi: 10.1109/TASE.2024.3360476, 2024.
[33]	L. Vlacic et al., “Automation 5.0: The key to systems intelligence and Industry 5.0,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 8, pp. 1723–1727, Aug. 2024. doi: 10.1109/JAS.2024.124635

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(1)

Get Citation

PDF

XML

Article Metrics

Article views (489) PDF downloads(148)

Evolution and Role of Optimizers in Training Deep Learning Models

doi: 10.1109/JAS.2024.124806

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content