IEEE/CAA Journal of Automatica Sinica
Citation: | J. Hou, X. Zeng, G. Wang, J. Sun, and J. Chen, “Distributed momentum-based Frank-Wolfe algorithm for stochastic optimization,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 685–699, Mar. 2023. doi: 10.1109/JAS.2022.105923 |
[1] |
J. Chen, J. Sun, and G. Wang, “From unmanned systems to autonomous intelligent systems,” Eng., vol. 12, no. 5, pp. 16–19, 2022.
|
[2] |
Z. Jiang, Q. Jia, and X. Guan, “On large action space in EV charging scheduling optimization,” Sci. China Inf. Sci., vol. 65, p. 122201, 2022.
|
[3] |
R. Yang, L. Liu, and G. Feng, “An overview of recent advances in distributed coordination of multi-agent systems,” Unmanned Syst., vol. 10, no. 3, pp. 307–325, 2022. doi: 10.1142/S2301385021500199
|
[4] |
L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. COMPSTAT. Heidelberg: Physica-Verlag HD, 2010, pp. 177–186.
|
[5] |
X. Wang, J. Sun, G. Wang, F. Allgöwer, and J. Chen, “Data-driven control of distributed event-triggered network systems,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 2, pp. 351–364, Feb. 2023.
|
[6] |
G. Wang, S. Lu, G. B. Giannakis, G. Tesauro, and J. Sun, “Decentralized TD tracking with linear function approximation and its finite-time analysis,” in Proc. Adv. Neural Inf. Process. Syst., 2020.
|
[7] |
D. Wang, Z. Wang, and Z. Wu, “Distributed convex optimization for nonlinear multi-agent systems disturbed by a second-order stationary process over a digraph,” Sci. China Inf. Sci., vol. 65, p. 132201, 2022. doi: 10.1007/s11432-020-3111-4
|
[8] |
A. Mokhtari, H. Hassani, and A. Karbasi, “Stochastic conditional gradient methods: From convex minimization to submodular maximization,” J. Mach. Learn. Res., vol. 21, no. 105, pp. 1–49, 2020.
|
[9] |
Z. Akhtar and K. Rajawat, “Momentum based projection free stochastic optimization under affine constraints,” in Proc. American Control Conf., 2021, pp. 2619–2624.
|
[10] |
S. J. Reddi, S. Sra, B. Póczos, and A. Smola, “Stochastic Frank-Wolfe methods for nonconvex optimization,” in Proc. Annu. Allerton Conf. Commu., Control, Comput., 2016, pp. 1244–1251.
|
[11] |
Y. Li, X. Wang, J. Sun, G. Wang, and J. Chen, “Data-driven consensus control of fully distributed event-triggered multi-agent systems,” Sci. China Inf. Sci., 2022. DOI: 10.1007/s11432-022-3629-1.
|
[12] |
S. Pu, A. Olshevsky, and I. C. Paschalidis, “Asymptotic network independence in distributed stochastic optimization for machine learning: Examining distributed and centralized stochastic gradient descent,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 114–122, 2020. doi: 10.1109/MSP.2020.2975212
|
[13] |
Z. Wang, J. Zhang, T.-H. Chang, J. Li, and Z.-Q. Luo, “Distributed stochastic consensus optimization with momentum for nonconvex nonsmooth problems,” IEEE Trans. Signal Process, vol. 69, pp. 4486–4501, 2021. doi: 10.1109/TSP.2021.3097211
|
[14] |
G. Wang, G. B. Giannakis, and J. Chen, “Learning ReLU networks on linearly separable data: Algorithm, optimality, and generalization,” IEEE Trans. Signal Process., vol. 67, no. 9, pp. 2357–2370, 2019. doi: 10.1109/TSP.2019.2904921
|
[15] |
Y. Dai and Y. Weng, “Synchronous parallel block coordinate descent method for nonsmooth convex function minimization,” J. Syst. Sci. Complex., vol. 33, no. 2, pp. 345–365, 2020. doi: 10.1007/s11424-020-8313-y
|
[16] |
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, “Robust stochastic approximation approach to stochastic programming,” SIAM J. Optim., vol. 19, no. 4, pp. 1574–1609, 2009. doi: 10.1137/070704277
|
[17] |
Y. Zhang, Y. Lou, Y. Hong, and L. Xie, “Distributed projection-based algorithms for source localization in wireless sensor networks,” IEEE Trans. Wireless Commun., vol. 14, no. 6, pp. 3131–3142, 2015. doi: 10.1109/TWC.2015.2402672
|
[18] |
X.-F. Wang, Y. Hong, X.-M. Sun, and K.-Z. Liu, “Distributed optimization for resource allocation problems under large delays,” IEEE Trans. Ind. Electron., vol. 66, no. 12, pp. 9448–9457, 2019. doi: 10.1109/TIE.2019.2891406
|
[19] |
S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,” SIAM J. Optim., vol. 23, no. 4, pp. 2341–2368, 2013. doi: 10.1137/120880811
|
[20] |
A. Agrawal, S. Barratt, and S. Boyd, “Learning convex optimization models,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355–1364, 2021. doi: 10.1109/JAS.2021.1004075
|
[21] |
S. Fujishige and S. Isotani, “A submodular function minimization algorithm based on the minimum-norm base,” Pacific J. Opt., vol. 7, no. 1, pp. 3–17, 2011.
|
[22] |
M. Frank and P. Wolfe, “An algorithm for quadratic programming,” Nav. Res. Logist., vol. 3, no. 1–2, pp. 95–110, 1956. doi: 10.1002/nav.3800030109
|
[23] |
G. Lan and Y. Zhou, “Conditional gradient sliding for convex optimization,” SIAM J. Optim., vol. 26, no. 2, pp. 1379–1409, 2016.
|
[24] |
D. S. Kalhan, A. Singh Bedi, A. Koppel, K. Rajawat, H. Hassani, A. K. Gupta, and A. Banerjee, “Dynamic online learning via Frank-Wolfe algorithm,” IEEE Trans. Signal Process., vol. 69, pp. 932–947, 2021. doi: 10.1109/TSP.2021.3051871
|
[25] |
T. Kerdreux, A. d’Aspremont, and S. Pokutta, “Projection-free optimization on uniformly convex sets,” in Proc. Artif. Intell. Statist., 2021, vol. 130, pp. 19–27.
|
[26] |
B. Li, M. Coutiño, G. B. Giannakis, and G. Leus, “A momentum-guided Frank-Wolfe algorithm,” IEEE Trans. Signal Process., vol. 69, pp. 3597–3611, 2021. doi: 10.1109/TSP.2021.3087910
|
[27] |
Y. Zhang, B. Li, and G. B. Giannakis, “Accelerating Frank-Wolfe with weighted average gradients,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2021, pp. 5529–5533.
|
[28] |
E. Hazan and S. Kale, “Projection-free online learning,” in Proc. 29th Int. Conf. Mach. Learn., 2012, pp. 521–528.
|
[29] |
E. Hazan and H. Luo, “Variance-reduced and projection-free stochastic optimization,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1263–1271.
|
[30] |
K. Bekiroglu, S. Srinivasan, E. Png, R. Su, and C. Lagoa, “Recursive approximation of complex behaviours with IoT-data imperfections,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 3, pp. 656–667, 2020. doi: 10.1109/JAS.2020.1003126
|
[31] |
M. Weber, “Projection-free nonconvex stochastic optimization on Riemannian manifolds,” IMA J. Numer. Anal., vol. 42, no. 4, pp. 3241–3271, 2021. doi: 10.1093/imanum/drab066
|
[32] |
A. Bellet, Y. Liang, A. B. Garakani, M. Balcan, and F. Sha, “A distributed Frank-Wolfe algorithm for communication-efficient sparse learning,” in Proc. SIAM Int. Conf. Data Mining, 2015, pp. 478–486.
|
[33] |
H.-T. Wai, A. Scaglione, J. Lafond, and E. Moulines, “A projection-free decentralized algorithm for non-convex optimization,” in Proc. IEEE Global Conf. Signal and Inf. Process, 2016, pp. 475–479.
|
[34] |
H.-T. Wai, J. Lafond, A. Scaglione, and E. Moulines, “Decentralized Frank-Wolfe algorithm for convex and nonconvex problems,” IEEE Trans. Autom. Control, vol. 62, no. 11, pp. 5522–5537, 2017. doi: 10.1109/TAC.2017.2685559
|
[35] |
G. Chen, P. Yi, and Y. Hong, “Distributed optimization with projection-free dynamics,” arXiv preprint arXiv: 2105.02450, 2021.
|
[36] |
L. Zhang, G. Wang, D. Romero, and G. B. Giannakis, “Randomized block Frank-Wolfe for convergent large-scale learning,” IEEE Trans. Signal Processing, vol. 65, no. 24, pp. 6448–6461, 2017. doi: 10.1109/TSP.2017.2755597
|
[37] |
J. Xie, C. Zhang, Z. Shen, C. Mi, and H. Qian, “Decentralized gradient tracking for continuous DR-submodular maximization,” in Proc. Int. Conf. Artif. Intell. Statist., 2019.
|
[38] |
H. Gao, H. Xu, and S. Vucetic, “Sample efficient decentralized stochastic Frank-Wolfe methods for continuous DR-Submodular maximization,” in Proc. Int. Joint Conf. Artif. Intell., 2021, pp. 3501–3507.
|
[39] |
A. Cutkosky and F. Orabona, “Momentum-based variance reduction in non-convex SGD,” in Proc. Adv. Neural Inf. Process. Syst., 2019, pp. 15210–15219.
|
[40] |
Y. E. Nesterov, “A method for solving the convex programming problem with convergence rate
|
[41] |
J. N. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. dissertation, Dept. Elect. Eng. Comput. Sci., MIT, Boston, USA, 1984.
|
[42] |
Y. Zhang, Y. Lou, and Y. Hong, “An approximate gradient algorithm for constrained distributed convex optimization,” IEEE/CAA J. Autom. Sinica, vol. 1, no. 1, pp. 61–67, 2014. doi: 10.1109/JAS.2014.7004621
|
[43] |
X. Ren, D. Li, Y. Xi, and H. Shao, “Distributed subgradient algorithm for multi-agent optimization with dynamic stepsize,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1451–1464, 2021. doi: 10.1109/JAS.2021.1003904
|
[44] |
J. Gao, X.-W. Liu, Y.-H. Dai, Y. Huang, and P. Yang, “Achieving geometric convergence for distributed optimization with barzilai-borwein step sizes,” Sci. China Inf. Sci., vol. 65, p. 149204, 2022.
|
[45] |
P. D. Lorenzo and G. Scutari, “Next: In-network nonconvex optimization,” IEEE Trans. Signal Inf. Process. Netw., vol. 2, no. 2, pp. 120–136, 2016.
|
[46] |
Z. Li, B. Liu, and Z. Ding, “Consensus-based cooperative algorithms for training over distributed data sets using stochastic gradients,” IEEE Trans. Neural Netw. Learn. Syst., vol. 33, no. 10, pp. 5579–5589, 2022.
|
[47] |
X. Ma, P. Yi, and J. Chen, “Distributed gradient tracking methods with finite data rates,” J. Syst. Sci. Complex., vol. 34, no. 5, pp. 1927–1952, 2021. doi: 10.1007/s11424-021-1231-9
|