A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 11 Issue 10
Oct.  2024

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
X. H. Wen and M. C. Zhou, “Evolution and role of optimizers in training deep learning models,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 10, pp. 2039–2042, Oct. 2024. doi: 10.1109/JAS.2024.124806
Citation: X. H. Wen and M. C. Zhou, “Evolution and role of optimizers in training deep learning models,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 10, pp. 2039–2042, Oct. 2024. doi: 10.1109/JAS.2024.124806

Evolution and Role of Optimizers in Training Deep Learning Models

doi: 10.1109/JAS.2024.124806
Funds:  This work was partially supported by the Guangxi Universities and Colleges Young and Middle-aged Teachers’ Scientific Research Basic Ability Enhancement Project (2023KY0055)
More Information
  • loading
  • [1]
    Z. Zhang et al., “Mapping network-coordinated stacked gated recurrent units for turbulence prediction,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 6, pp. 1331–1341, 2024. doi: 10.1109/JAS.2024.124335
    [2]
    H. Liu, et al., “Aspect-based sentiment analysis: A survey of deep learning methods,” IEEE Trans. Computational Social Systems, vol. 7, no. 6, pp. 1358–1375, Dec. 2020. doi: 10.1109/TCSS.2020.3033302
    [3]
    H. Wu et al., “A PID-incorporated latent factorization of tensors approach to dynamically weighted directed network analysis,” IEEE/ CAA J. Autom. Sinica, vol. 9, no. 3, pp. 533–546, 2022. doi: 10.1109/JAS.2021.1004308
    [4]
    W. Xu et al., “Transformer-based macroscopic regulation for highspeed railway timetable rescheduling,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 9, pp. 1822–1833, 2023. doi: 10.1109/JAS.2023.123501
    [5]
    I. Goodfellow et al., Deep Learning. MIT press, 2016.
    [6]
    S. Ruder, “An overview of gradient descent optimization algorithms,” arXiv preprint arXiv: 1609.04747, 2016.
    [7]
    H. Robbins et al., “A stochastic approximation method,” The Annals of Mathematical Statistics, pp. 400–407, 1951.
    [8]
    J. Duchi et al., “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, p. 7, 2011.
    [9]
    T. Tieleman, “Lecture 6.5-RMSprop: Divide the gradient by a running average of its recent magnitude,” COURSERA: Neural Networks for Machine Learning, vol. 4, p. 2, 2012.
    [10]
    D. P. Kingma et al., “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.
    [11]
    A. Vaswani et al., “Attention is all you need,” Advances in Neural Information Processing Systems, p. 30, 2017.
    [12]
    N. S. Keskar et al., “Improving generalization performance by switching from Adam to SGD,” arXiv preprint arXiv: 1712.07628, 2017.
    [13]
    L. Luo et al., “Adaptive gradient methods with dynamic bound of learning rate,” in Proc. Int. Conf. Learning Representations, 2019.
    [14]
    I. Loshchilov et al., “Decoupled weight decay regularization,” arXiv preprint arXiv: 1711.05101, 2017.
    [15]
    P. Foret et al., “Sharpness-aware minimization for efficiently improving generalization,” arXiv preprint arXiv: 2010.01412, 2020.
    [16]
    J. Kwon et al., “ASAM: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks,” in Proc. Int. Conf. Machine Learning, 2021, pp. 5905–5914.
    [17]
    X. Xie et al., “Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models,” arXiv preprint arXiv: 2208.06677, 2022.
    [18]
    J. Zhuang et al., “Adabelief optimizer: Adapting stepsizes by the belief in observed gradients,” in Proc. Conf. Neural Information Processing Systems, 2020.
    [19]
    H. Liu et al., “Sophia: A scalable stochastic second-order optimizer for language model pre-training,” arXiv preprint arXiv: 2305.14342, 2023.
    [20]
    J. Chen et al., “Hierarchical particle swarm optimization-incorporated latent factor analysis for large-scale incomplete matrices,” IEEE Trans. Big Data, vol. 8, no. 6, pp. 1524–1536, 2022.
    [21]
    M. Cui et al., “Surrogate-assisted autoencoder-embedded evolutionary optimization algorithm to solve high-dimensional expensive problems,” IEEE Trans. Evolutionary Computation, vol. 26, no. 4, pp. 676–689, 2022. doi: 10.1109/TEVC.2021.3113923
    [22]
    G. Wei et al., “A hybrid probabilistic multiobjective evolutionary algorithm for commercial recommendation systems,” IEEE Trans. Computational Social Systems, vol. 8, no. 3, pp. 589–598, 2021. doi: 10.1109/TCSS.2021.3055823
    [23]
    J. Bi et al., “Energy-optimized partial computation offloading in mobile-edge computing with genetic simulated-annealing-based particle swarm optimization,” IEEE Internet of Things Journal, vol. 8, no. 5, pp. 3774–3785, 2021. doi: 10.1109/JIOT.2020.3024223
    [24]
    S. Gao et al., “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 2, pp. 601–614, 2018.
    [25]
    Y. Yu et al., “Improving dendritic neuron model with dynamic scalefree network-based differential evolution,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 99–110, 2021.
    [26]
    X. Luo et al., “Interpretability diversity for decision-tree-initialized dendritic neuron model ensemble,” IEEE Trans. Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2023.3290203, 2023.
    [27]
    Y. Yu et al., “Improving dendritic neuron model with dynamic scale-free network-based differential evolution,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 1, pp. 99–110, Jan. 2022. doi: 10.1109/JAS.2021.1004284
    [28]
    S. Gao et al., “Fully complex-valued dendritic neuron model,” IEEE Trans. Neural Networks and Learning Systems, vol. 34, no. 4, pp. 2105–2118, Apr. 2023. doi: 10.1109/TNNLS.2021.3105901
    [29]
    F.-Y. Wang, “Intelligent vehicles from your HomePorts to underwaters and low attitude airspaces: SLAM for smart societies,” IEEE Trans. Intelligent Vehicles, vol. 9, no. 2, pp. 3092–3105, Feb. 2024. doi: 10.1109/TIV.2024.3373614
    [30]
    G. Yuan et al., “An autonomous vehicle group cooperation model in an urban scene,” IEEE Trans. Intelligent Transportation Systems, vol. 24, no. 12, pp. 13852–13862, Dec. 2023. doi: 10.1109/TITS.2023.3300278
    [31]
    Q. Zhao et al., “A tutorial on Internet of behaviors: Concept, architecture, technology, applications, and challenges,” IEEE Communi cations Surveys & Tutorials, vol. 25, no. 2, pp. 1227–1260, Secondquarter 2023.
    [32]
    S. Lou et al., “Human-cyber-physical system for Industry 5.0: A review from a human-centric perspective,” IEEE Trans. Automation Science and Engineering, doi: 10.1109/TASE.2024.3360476, 2024.
    [33]
    L. Vlacic et al., “Automation 5.0: The key to systems intelligence and Industry 5.0,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 8, pp. 1723–1727, Aug. 2024. doi: 10.1109/JAS.2024.124635

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(1)

    Article Metrics

    Article views (246) PDF downloads(111) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return