A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 10 Issue 3
Mar.  2023

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 11.8, Top 4% (SCI Q1)
    CiteScore: 17.6, Top 3% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
J. Hou, X. Zeng, G. Wang, J. Sun, and J. Chen, “Distributed momentum-based Frank-Wolfe algorithm for stochastic optimization,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 685–699, Mar. 2023. doi: 10.1109/JAS.2022.105923
Citation: J. Hou, X. Zeng, G. Wang, J. Sun, and J. Chen, “Distributed momentum-based Frank-Wolfe algorithm for stochastic optimization,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 685–699, Mar. 2023. doi: 10.1109/JAS.2022.105923

Distributed Momentum-Based Frank-Wolfe Algorithm for Stochastic Optimization

doi: 10.1109/JAS.2022.105923
Funds:  This work was supported in part by the National Key R&D Program of China (2021YFB1714800), the National Natural Science Foundation of China (62222303, 62073035, 62173034, 61925303, 62088101, 61873033), the CAAI-Huawei MindSpore Open Fund, and the Chongqing Natural Science Foundation (2021ZX4100027)
More Information
  • This paper considers distributed stochastic optimization, in which a number of agents cooperate to optimize a global objective function through local computations and information exchanges with neighbors over a network. Stochastic optimization problems are usually tackled by variants of projected stochastic gradient descent. However, projecting a point onto a feasible set is often expensive. The Frank-Wolfe (FW) method has well-documented merits in handling convex constraints, but existing stochastic FW algorithms are basically developed for centralized settings. In this context, the present work puts forth a distributed stochastic Frank-Wolfe solver, by judiciously combining Nesterov’s momentum and gradient tracking techniques for stochastic convex and nonconvex optimization over networks. It is shown that the convergence rate of the proposed algorithm is ${\cal{O}}(k^{-\frac{1}{2}})$ for convex optimization, and ${\cal{O}}(1/\mathrm{log}_2(k))$ for nonconvex optimization. The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives.

     

  • loading
  • [1]
    J. Chen, J. Sun, and G. Wang, “From unmanned systems to autonomous intelligent systems,” Eng., vol. 12, no. 5, pp. 16–19, 2022.
    [2]
    Z. Jiang, Q. Jia, and X. Guan, “On large action space in EV charging scheduling optimization,” Sci. China Inf. Sci., vol. 65, p. 122201, 2022.
    [3]
    R. Yang, L. Liu, and G. Feng, “An overview of recent advances in distributed coordination of multi-agent systems,” Unmanned Syst., vol. 10, no. 3, pp. 307–325, 2022. doi: 10.1142/S2301385021500199
    [4]
    L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. COMPSTAT. Heidelberg: Physica-Verlag HD, 2010, pp. 177–186.
    [5]
    X. Wang, J. Sun, G. Wang, F. Allgöwer, and J. Chen, “Data-driven control of distributed event-triggered network systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 351–364, Feb. 2023.
    [6]
    G. Wang, S. Lu, G. B. Giannakis, G. Tesauro, and J. Sun, “Decentralized TD tracking with linear function approximation and its finite-time analysis,” in Proc. Adv. Neural Inf. Process. Syst., 2020.
    [7]
    D. Wang, Z. Wang, and Z. Wu, “Distributed convex optimization for nonlinear multi-agent systems disturbed by a second-order stationary process over a digraph,” Sci. China Inf. Sci., vol. 65, p. 132201, 2022. doi: 10.1007/s11432-020-3111-4
    [8]
    A. Mokhtari, H. Hassani, and A. Karbasi, “Stochastic conditional gradient methods: From convex minimization to submodular maximization,” J. Mach. Learn. Res., vol. 21, no. 105, pp. 1–49, 2020.
    [9]
    Z. Akhtar and K. Rajawat, “Momentum based projection free stochastic optimization under affine constraints,” in Proc. American Control Conf., 2021, pp. 2619–2624.
    [10]
    S. J. Reddi, S. Sra, B. Póczos, and A. Smola, “Stochastic Frank-Wolfe methods for nonconvex optimization,” in Proc. Annu. Allerton Conf. Commu., Control, Comput., 2016, pp. 1244–1251.
    [11]
    Y. Li, X. Wang, J. Sun, G. Wang, and J. Chen, “Data-driven consensus control of fully distributed event-triggered multi-agent systems,” Sci. China Inf. Sci., 2022. DOI: 10.1007/s11432-022-3629-1.
    [12]
    S. Pu, A. Olshevsky, and I. C. Paschalidis, “Asymptotic network independence in distributed stochastic optimization for machine learning: Examining distributed and centralized stochastic gradient descent,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 114–122, 2020. doi: 10.1109/MSP.2020.2975212
    [13]
    Z. Wang, J. Zhang, T.-H. Chang, J. Li, and Z.-Q. Luo, “Distributed stochastic consensus optimization with momentum for nonconvex nonsmooth problems,” IEEE Trans. Signal Process, vol. 69, pp. 4486–4501, 2021. doi: 10.1109/TSP.2021.3097211
    [14]
    G. Wang, G. B. Giannakis, and J. Chen, “Learning ReLU networks on linearly separable data: Algorithm, optimality, and generalization,” IEEE Trans. Signal Process., vol. 67, no. 9, pp. 2357–2370, 2019. doi: 10.1109/TSP.2019.2904921
    [15]
    Y. Dai and Y. Weng, “Synchronous parallel block coordinate descent method for nonsmooth convex function minimization,” J. Syst. Sci. Complex., vol. 33, no. 2, pp. 345–365, 2020. doi: 10.1007/s11424-020-8313-y
    [16]
    A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM J. Optim., vol. 19, no. 4, pp. 1574–1609, 2009. doi: 10.1137/070704277
    [17]
    Y. Zhang, Y. Lou, Y. Hong, and L. Xie, “Distributed projection-based algorithms for source localization in wireless sensor networks,” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3131–3142, 2015. doi: 10.1109/TWC.2015.2402672
    [18]
    X.-F. Wang, Y. Hong, X.-M. Sun, and K.-Z. Liu, “Distributed optimization for resource allocation problems under large delays,” IEEE Trans. Ind. Electron., vol. 66, no. 12, pp. 9448–9457, 2019. doi: 10.1109/TIE.2019.2891406
    [19]
    S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,” SIAM J. Optim., vol. 23, no. 4, pp. 2341–2368, 2013. doi: 10.1137/120880811
    [20]
    A. Agrawal, S. Barratt, and S. Boyd, “Learning convex optimization models,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355–1364, 2021. doi: 10.1109/JAS.2021.1004075
    [21]
    S. Fujishige and S. Isotani, “A submodular function minimization algorithm based on the minimum-norm base,” Pacific J. Opt., vol. 7, no. 1, pp. 3–17, 2011.
    [22]
    M. Frank and P. Wolfe, “An algorithm for quadratic programming,” Nav. Res. Logist., vol. 3, no. 1–2, pp. 95–110, 1956. doi: 10.1002/nav.3800030109
    [23]
    G. Lan and Y. Zhou, “Conditional gradient sliding for convex optimization,” SIAM J. Optim., vol. 26, no. 2, pp. 1379–1409, 2016.
    [24]
    D. S. Kalhan, A. Singh Bedi, A. Koppel, K. Rajawat, H. Hassani, A. K. Gupta, and A. Banerjee, “Dynamic online learning via Frank-Wolfe algorithm,” IEEE Trans. Signal Process., vol. 69, pp. 932–947, 2021. doi: 10.1109/TSP.2021.3051871
    [25]
    T. Kerdreux, A. d’Aspremont, and S. Pokutta, “Projection-free optimization on uniformly convex sets,” in Proc. Artif. Intell. Statist., 2021, vol. 130, pp. 19–27.
    [26]
    B. Li, M. Coutiño, G. B. Giannakis, and G. Leus, “A momentum-guided Frank-Wolfe algorithm,” IEEE Trans. Signal Process., vol. 69, pp. 3597–3611, 2021. doi: 10.1109/TSP.2021.3087910
    [27]
    Y. Zhang, B. Li, and G. B. Giannakis, “Accelerating Frank-Wolfe with weighted average gradients,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2021, pp. 5529–5533.
    [28]
    E. Hazan and S. Kale, “Projection-free online learning,” in Proc. 29th Int. Conf. Mach. Learn., 2012, pp. 521–528.
    [29]
    E. Hazan and H. Luo, “Variance-reduced and projection-free stochastic optimization,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1263–1271.
    [30]
    K. Bekiroglu, S. Srinivasan, E. Png, R. Su, and C. Lagoa, “Recursive approximation of complex behaviours with IoT-data imperfections,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 656–667, 2020. doi: 10.1109/JAS.2020.1003126
    [31]
    M. Weber, “Projection-free nonconvex stochastic optimization on Riemannian manifolds,” IMA J. Numer. Anal., vol. 42, no. 4, pp. 3241–3271, 2021. doi: 10.1093/imanum/drab066
    [32]
    A. Bellet, Y. Liang, A. B. Garakani, M. Balcan, and F. Sha, “A distributed Frank-Wolfe algorithm for communication-efficient sparse learning,” in Proc. SIAM Int. Conf. Data Mining, 2015, pp. 478–486.
    [33]
    H.-T. Wai, A. Scaglione, J. Lafond, and E. Moulines, “A projection-free decentralized algorithm for non-convex optimization,” in Proc. IEEE Global Conf. Signal and Inf. Process, 2016, pp. 475–479.
    [34]
    H.-T. Wai, J. Lafond, A. Scaglione, and E. Moulines, “Decentralized Frank-Wolfe algorithm for convex and nonconvex problems,” IEEE Trans. Autom. Control, vol. 62, no. 11, pp. 5522–5537, 2017. doi: 10.1109/TAC.2017.2685559
    [35]
    G. Chen, P. Yi, and Y. Hong, “Distributed optimization with projection-free dynamics,” arXiv preprint arXiv: 2105.02450, 2021.
    [36]
    L. Zhang, G. Wang, D. Romero, and G. B. Giannakis, “Randomized block Frank-Wolfe for convergent large-scale learning,” IEEE Trans. Signal Processing, vol. 65, no. 24, pp. 6448–6461, 2017. doi: 10.1109/TSP.2017.2755597
    [37]
    J. Xie, C. Zhang, Z. Shen, C. Mi, and H. Qian, “Decentralized gradient tracking for continuous DR-submodular maximization,” in Proc. Int. Conf. Artif. Intell. Statist., 2019.
    [38]
    H. Gao, H. Xu, and S. Vucetic, “Sample efficient decentralized stochastic Frank-Wolfe methods for continuous DR-Submodular maximization,” in Proc. Int. Joint Conf. Artif. Intell., 2021, pp. 3501–3507.
    [39]
    A. Cutkosky and F. Orabona, “Momentum-based variance reduction in non-convex SGD,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 15210–15219.
    [40]
    Y. E. Nesterov, “A method for solving the convex programming problem with convergence rate $ {\cal{O}}(1/k^2)$Dokl. Akad. Nauk SSSR, vol. 269, no. 3, pp. 543–547, 1983.
    [41]
    J. N. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., MIT, Boston, USA, 1984.
    [42]
    Y. Zhang, Y. Lou, and Y. Hong, “An approximate gradient algorithm for constrained distributed convex optimization,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 1, pp. 61–67, 2014. doi: 10.1109/JAS.2014.7004621
    [43]
    X. Ren, D. Li, Y. Xi, and H. Shao, “Distributed subgradient algorithm for multi-agent optimization with dynamic stepsize,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1451–1464, 2021. doi: 10.1109/JAS.2021.1003904
    [44]
    J. Gao, X.-W. Liu, Y.-H. Dai, Y. Huang, and P. Yang, “Achieving geometric convergence for distributed optimization with barzilai-borwein step sizes,” Sci. China Inf. Sci., vol. 65, p. 149204, 2022.
    [45]
    P. D. Lorenzo and G. Scutari, “Next: In-network nonconvex optimization,” IEEE Trans. Signal Inf. Process. Netw., vol. 2, no. 2, pp. 120–136, 2016.
    [46]
    Z. Li, B. Liu, and Z. Ding, “Consensus-based cooperative algorithms for training over distributed data sets using stochastic gradients,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5579–5589, 2022.
    [47]
    X. Ma, P. Yi, and J. Chen, “Distributed gradient tracking methods with finite data rates,” J. Syst. Sci. Complex., vol. 34, no. 5, pp. 1927–1952, 2021. doi: 10.1007/s11424-021-1231-9

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(2)

    Article Metrics

    Article views (297) PDF downloads(52) Cited by()

    Highlights

    • A distributed stochastic Frank-Wolfe algorithm is proposed for stochastic optimization problems by judiciously combining Nesterov’s momentum and gradient tracking techniques
    • The convergence rate results of the proposed algorithm for convex and nonconvex problems are established, which, to the authors’ best knowledge, marks the first FW’s convergence rate results for distributed stochastic optimization
    • The efficacy of the algorithm is demonstrated by numerical simulations against a number of competing alternatives

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return