Learning Convex Optimization Models

Akshay Agrawal; Shane Barratt; Stephen Boyd

doi:10.1109/JAS.2021.1004075

Volume 8 Issue 8

Aug. 2021

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2021 > 8(8): 1355-1364

A. Agrawal, S. Barratt, and S. Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021. doi: 10.1109/JAS.2021.1004075

Citation:

A. Agrawal, S. Barratt, and S. Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021. doi: 10.1109/JAS.2021.1004075

A. Agrawal, S. Barratt, and S. Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021. doi: 10.1109/JAS.2021.1004075

Citation:

A. Agrawal, S. Barratt, and S. Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021. doi: 10.1109/JAS.2021.1004075

PDF( 948 KB)

Learning Convex Optimization Models

doi: 10.1109/JAS.2021.1004075

Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA

More Information

Author Bio:
Akshay Agrawal is a Ph.D. candidate at Stanford University, USA, advised by Professor Stephen Boyd. His research interests are in applications of optimization and machine learning. He received the B.S. and M.S. in computer science from Stanford University in 2016 and 2017, respectivelly. Akshay is the developer of PyMDE, a library for computing embeddings of large datasets, and a core developer of CVXPY, a domain-specific language for convex optimization used by many universities and companies

Shane Barratt is a Ph.D. candidate in the Department of Electrical Engineering at Stanford University. His research focuses on convex optimization, and in particular its applications to machine learning and control. He received the M.S. degree in electrical engineering from Stanford University in 2019, and the B.S. degree in electrical engineering and computer science from the University of California, USA, in 2017

Stephen Boyd (F’99) is the Samsung Professor of engineering, and Professor of electrical engineering in the Information Systems Laboratory at Stanford University, with courtesy appointments in computer science and management science and engineering. He received the A.B. degree in mathematics from Harvard University, USA, in 1980, and the Ph.D. in electrical engineering and computer science from the University of California, USA, in 1985, and then joined the faculty at Stanford. His current research interests include convex optimization applications in control, signal processing, machine learning, finance, and circuit design
Corresponding author: Shane Barratt, e-mail: sbarratt@stanford.edu
¹ redacted
Received Date: 2020-12-01
Revised Date: 2021-02-02
Accepted Date: 2021-03-28

Available Online: 2021-05-12

Abstract

Abstract

A convex optimization model predicts an output from an input by solving a convex optimization problem. The class of convex optimization models is large, and includes as special cases many well-known models like linear and logistic regression. We propose a heuristic for learning the parameters in a convex optimization model given a dataset of input-output pairs, using recently developed methods for differentiating the solution of a convex optimization problem with respect to its parameters. We describe three general classes of convex optimization models, maximum a posteriori (MAP) models, utility maximization models, and agent models, and present a numerical experiment for each.
- Convex optimization,
- differentiable optimization,
- machine learning

FullText(HTML)

¹ redacted

References(55)

References

[1]	A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in Proc. 33rd Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 9558–9570.
[2]	A. Agrawal, S. Barratt, S. Boyd, E. Busseti, and W. Moursi, “Differentiating through a cone program,” J. Appl. Numer. Optim., vol. 1, no. 2, pp. 107–115, Aug. 2019.
[3]	B. Amos, “Differentiable optimization-based modeling for machine learning,” Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, USA, 2019.
[4]	E. Busseti, W. M. Moursi, and S. Boyd, “Solution refinement at regular points of conic problems,” Comput. Optim. Appl., vol. 74, no. 3, pp. 627–643, Aug. 2019. doi: 10.1007/s10589-019-00122-9
[5]	S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, USA: Cambridge University Press, 2004.
[6]	G. BakIr, T. Hofmann, B. Schölkopf, A. J. Smola, B. Taskar, and S. V. N. V. Vishwanathan, Predicting Structured Data. Cambridge, USA: MIT Press, 2007.
[7]	Y. LeCun, S. Chopra, R. Hadsell, M. A. Ranzato, and F. J. Huang, “A tutorial on energy-based learning,” in Predicting Structured Data, G. Bakir, T. Hofman, B. Scholkopf, A. Smola, and B. Taskar, Eds. Cambridge, USA: MIT Press, 2006.
[8]	B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning structured prediction models: A large margin approach,” in Proc. 22nd Int. Conf. Machine Learning, Bonn, Germany, 2005, pp. 896–903.
[9]	B. Taskar, C. Guestrin, and D. Koller, “Max-margin Markov networks,” in Proc. 16th Int. Conf. Neural Information Processing Systems, Vancouver and Whistler, British Columbia, Canada, 2004, pp. 25–32.
[10]	I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” J. Mach. Learn. Res., vol. 6, pp. 1453–1484, Dec. 2005.
[11]	S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 539–546.
[12]	D. Belanger and A. McCallum, “Structured prediction energy networks,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 983–992.
[13]	D. Belanger, B. S. Yang, and A. McCallum, “End-to-end learning for structured prediction energy networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 429–439.
[14]	B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 146–155.
[15]	J. Peng, L. F. Bo, and J. B. Xu, “Conditional neural fields,” in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1419–1427.
[16]	S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Z. Su, D. L. Du, C. Huang, and P. H. S. Torr, “Conditional random fields as recurrent neural networks,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1529–1537.
[17]	L. C. Chen, A. G. Schwing, A. L. Yuille, and R. Urtasun, “Learning deep structured models,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1785–1794.
[18]	Z. L. Geng, D. Johnson, and R. Fedkiw, “Coercing machine learning to output physically accurate results,” J. Comput. Phys., vol. 406, Article No. 109099, Apr. 2020. doi: 10.1016/j.jcp.2019.109099
[19]	R. K. Ahuja and J. B. Orlin, “Inverse optimization,” Oper. Res., vol. 49, no. 5, pp. 771–783, Oct. 2001. doi: 10.1287/opre.49.5.771.10607
[20]	C. Heuberger, “Inverse combinatorial optimization: A survey on problems, methods, and results,” J. Comb. Optim., vol. 8, no. 3, pp. 329–361, Sept. 2004. doi: 10.1023/B:JOCO.0000038914.26975.9b
[21]	V. Chatalbashev, “Inverse convex optimization,” M.S. thesis, Stanford Univ., Stanford, USA, 2005.
[22]	A. Keshavarz, Y. Wang, and S. Boyd, “Imputing a convex objective function,” in Proc. IEEE Int. Symp. Intelligent Control, Denver, USA, 2011, pp. 613–619.
[23]	B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 136–145.
[24]	A. V. Fiacco and G. P. McCormick, Nonlinear Programming—Sequential Unconstrained Minimization Techniques. New York, USA: John Wiley & Sons Ltd, 1990.
[25]	A. V. Fiacco, “Sensitivity analysis for nonlinear programming using penalty methods,” Mathem. Program., vol. 10, no. 1, pp. 287–311, Dec. 1976. doi: 10.1007/BF01580677
[26]	B. Amos, I. D. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable MPC for end-to-end planning and control,” in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 8299–8310.
[27]	F. de A. Belbute-Peres, K. Smith, K. R. Allen, J. B. Tenenbaum, and J. Z. Kolter, “End-to-end differentiable physics for learning and control,” in Proc. 32nd Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 7178–7189.
[28]	S. T. Barratt and S. P. Boyd, “Fitting a Kalman smoother to data,” in Proc. American Control Conf., Denver, USA, 2020.
[29]	A. Agrawal, S. Barratt, S. Boyd, and B. Stellato, “Learning convex optimization control policies,” in Proc. 2nd Annu. Conf. Learning for Dynamics and Control, 2020.
[30]	C. K. Ling, F. Fang, and J. Z. Kolter, “What game are we playing? end-to-end learning in normal and extensive form games,” in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018.
[31]	C. K. Ling, F. Fang, and J. Z. Kolter, “Large scale learning of agent rationality in two-player zero-sum games,” in Proc. AAAI Conf. Artificial Intelligence, Palo Alto, USA, 2019, pp. 6104–6111.
[32]	Q. Berthet, M. Blondel, O. Teboul, M. Cuturi, J. P. Vert, and F. Bach, “Learning with differentiable perturbed optimizers,” arXiv preprint arXiv: 2002.08676, Jun. 2020.
[33]	S. Barratt, G. Angeris, and S. Boyd, “Automatic repair of convex optimization problems,” Optim. Eng., vol. 22, pp. 247–259, 2021. doi: 10.1007/s11081-020-09508-9
[34]	B. Amos and D. Yarats, “The differentiable cross-entropy method,” arXiv preprint arXiv: 1909.12830, Aug. 2020.
[35]	S. Barratt and S. Boyd, “Least squares auto-tuning,” Eng. Optim., vol. 53, no. 5, pp. 789–810, May 2021. doi: 10.1080/0305215X.2020.1754406
[36]	S. Barratt and R. Sharma, “Optimizing for generalization in machine learning with cross-validation gradients,” arXiv preprint arXiv: 1805.07072, May 2018.
[37]	S. Diamond, V. Sitzmann, F. Heide, and G. Wetzstein, “Unrolled optimization with deep priors,” arXiv preprint arXiv: 1705.08041, Dec. 2018.
[38]	J. Domke, “Generic methods for optimization-based modeling,” in Proc. 15th Int. Conf. Artificial Intelligence and Statistics, La Palma, Canary Islands, 2012, pp. 318–326.
[39]	C. Finn, “Learning to learn with gradients,” Univ. California, Berkeley, USA, Tech. Rep. No. UCB/EECS-2018–105, 2018.
[40]	D. Maclaurin, D. Duvenaud, and R. P. Adams, “Gradient-based hyperparameter optimization through reversible learning,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 2113–2122.
[41]	A. Agrawal and S. Boyd, “Differentiating through log-log convex programs,” arXiv preprint arXiv: 2004.12553, May 2020.
[42]	J. Lorraine, P. Vicol, and D. Duvenaud, “Optimizing millions of hyperparameters by implicit differentiation,” arXiv preprint arXiv: 1911.02590, Nov. 2019.
[43]	N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evol. Comput., vol. 9, no. 2, pp. 159–195, Jun. 2001. doi: 10.1162/106365601750190398
[44]	J. Močkus, “On Bayesian methods for seeking the extremum,” in Proc. IFIP Technical Conf. Optimization Techniques, Novosibirsk, Russia, 1974, pp. 400–404.
[45]	F. J. Solis and R. J. B. Wets, “Minimization by random search techniques,” Math. Oper. Res., vol. 6, no. 1, pp. 19–30, Feb. 1981. doi: 10.1287/moor.6.1.19
[46]	A. Beck and M. Teboulle, “Gradient-based algorithms with applications to signal-recovery,” in Convex Optimization in Signal Processing and Communications, D. P. Palomar and Y. C. Eldar, Eds. Cambridge, USA: Cambridge University Press, 2009, pp. 42–88.
[47]	L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. COMPSTAT, Y. Lechevallier and G. Saporta, Eds. Heidelberg: Physica-Verlag, 2010, pp. 177–186.
[48]	J. Nocedal and S. J. Wright, Numerical Optimization. New York, USA: Springer, 2006.
[49]	C. M. Bishop, Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
[50]	I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, USA: MIT Press, 2016.
[51]	R. E. Barlow and H. D. Brunk, “The isotonic regression problem and its dual,” J. Am. Stat. Assoc., vol. 41, no. 5, pp. 140–147, Mar. 1972.
[52]	M. J. Best and N. Chakravarti, “Active set algorithms for isotonic regression; A unifying framework,” Math. Program., vol. 47, no. 1–3, pp. 425–439, May 1990. doi: 10.1007/BF01580873
[53]	T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Tokyo, Japan: Springer Science & Business Media, 2009.
[54]	S. J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, “ $\ell_1$ trend filtering” SIAM Rev., vol. 51, no. 2, pp. 339–360, May 2009. doi: 10.1137/070690274
[55]	D. Bertsekas, Dynamic Programming and Optimal Control, Vol. I. 4th ed. Belmont, USA: Athena Scientific, 2017.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(4)

Get Citation

PDF

XML

Article Metrics

Article views (1269) PDF downloads(134)

Highlights

Convex optimization models predict outputs by solving an optimization problem
We show how many models are in fact convex optimization models
We give a general technique for learning convex optimization models

Learning Convex Optimization Models

doi: 10.1109/JAS.2021.1004075

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content