IEEE/CAA Journal of Automatica Sinica
Citation: | A. Agrawal, S. Barratt, and S. Boyd, "Learning Convex Optimization Models," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1355-1364, Aug. 2021. doi: 10.1109/JAS.2021.1004075 |
[1] |
A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in Proc. 33rd Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 9558–9570.
|
[2] |
A. Agrawal, S. Barratt, S. Boyd, E. Busseti, and W. Moursi, “Differentiating through a cone program,” J. Appl. Numer. Optim., vol. 1, no. 2, pp. 107–115, Aug. 2019.
|
[3] |
B. Amos, “Differentiable optimization-based modeling for machine learning,” Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, USA, 2019.
|
[4] |
E. Busseti, W. M. Moursi, and S. Boyd, “Solution refinement at regular points of conic problems,” Comput. Optim. Appl., vol. 74, no. 3, pp. 627–643, Aug. 2019. doi: 10.1007/s10589-019-00122-9
|
[5] |
S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, USA: Cambridge University Press, 2004.
|
[6] |
G. BakIr, T. Hofmann, B. Schölkopf, A. J. Smola, B. Taskar, and S. V. N. V. Vishwanathan, Predicting Structured Data. Cambridge, USA: MIT Press, 2007.
|
[7] |
Y. LeCun, S. Chopra, R. Hadsell, M. A. Ranzato, and F. J. Huang, “A tutorial on energy-based learning,” in Predicting Structured Data, G. Bakir, T. Hofman, B. Scholkopf, A. Smola, and B. Taskar, Eds. Cambridge, USA: MIT Press, 2006.
|
[8] |
B. Taskar, V. Chatalbashev, D. Koller, and C. Guestrin, “Learning structured prediction models: A large margin approach,” in Proc. 22nd Int. Conf. Machine Learning, Bonn, Germany, 2005, pp. 896–903.
|
[9] |
B. Taskar, C. Guestrin, and D. Koller, “Max-margin Markov networks,” in Proc. 16th Int. Conf. Neural Information Processing Systems, Vancouver and Whistler, British Columbia, Canada, 2004, pp. 25–32.
|
[10] |
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun, “Large margin methods for structured and interdependent output variables,” J. Mach. Learn. Res., vol. 6, pp. 1453–1484, Dec. 2005.
|
[11] |
S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 539–546.
|
[12] |
D. Belanger and A. McCallum, “Structured prediction energy networks,” in Proc. 33rd Int. Conf. Machine Learning, New York, USA, 2016, pp. 983–992.
|
[13] |
D. Belanger, B. S. Yang, and A. McCallum, “End-to-end learning for structured prediction energy networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 429–439.
|
[14] |
B. Amos, L. Xu, and J. Z. Kolter, “Input convex neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 146–155.
|
[15] |
J. Peng, L. F. Bo, and J. B. Xu, “Conditional neural fields,” in Proc. 22nd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2009, pp. 1419–1427.
|
[16] |
S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Z. Su, D. L. Du, C. Huang, and P. H. S. Torr, “Conditional random fields as recurrent neural networks,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1529–1537.
|
[17] |
L. C. Chen, A. G. Schwing, A. L. Yuille, and R. Urtasun, “Learning deep structured models,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 1785–1794.
|
[18] |
Z. L. Geng, D. Johnson, and R. Fedkiw, “Coercing machine learning to output physically accurate results,” J. Comput. Phys., vol. 406, Article No. 109099, Apr. 2020. doi: 10.1016/j.jcp.2019.109099
|
[19] |
R. K. Ahuja and J. B. Orlin, “Inverse optimization,” Oper. Res., vol. 49, no. 5, pp. 771–783, Oct. 2001. doi: 10.1287/opre.49.5.771.10607
|
[20] |
C. Heuberger, “Inverse combinatorial optimization: A survey on problems, methods, and results,” J. Comb. Optim., vol. 8, no. 3, pp. 329–361, Sept. 2004. doi: 10.1023/B:JOCO.0000038914.26975.9b
|
[21] |
V. Chatalbashev, “Inverse convex optimization,” M.S. thesis, Stanford Univ., Stanford, USA, 2005.
|
[22] |
A. Keshavarz, Y. Wang, and S. Boyd, “Imputing a convex objective function,” in Proc. IEEE Int. Symp. Intelligent Control, Denver, USA, 2011, pp. 613–619.
|
[23] |
B. Amos and J. Z. Kolter, “OptNet: Differentiable optimization as a layer in neural networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 136–145.
|
[24] |
A. V. Fiacco and G. P. McCormick, Nonlinear Programming—Sequential Unconstrained Minimization Techniques. New York, USA: John Wiley & Sons Ltd, 1990.
|
[25] |
A. V. Fiacco, “Sensitivity analysis for nonlinear programming using penalty methods,” Mathem. Program., vol. 10, no. 1, pp. 287–311, Dec. 1976. doi: 10.1007/BF01580677
|
[26] |
B. Amos, I. D. Jimenez, J. Sacks, B. Boots, and J. Z. Kolter, “Differentiable MPC for end-to-end planning and control,” in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 8299–8310.
|
[27] |
F. de A. Belbute-Peres, K. Smith, K. R. Allen, J. B. Tenenbaum, and J. Z. Kolter, “End-to-end differentiable physics for learning and control,” in Proc. 32nd Conf. Neural Information Processing Systems, Montreal, Canada, 2018, pp. 7178–7189.
|
[28] |
S. T. Barratt and S. P. Boyd, “Fitting a Kalman smoother to data,” in Proc. American Control Conf., Denver, USA, 2020.
|
[29] |
A. Agrawal, S. Barratt, S. Boyd, and B. Stellato, “Learning convex optimization control policies,” in Proc. 2nd Annu. Conf. Learning for Dynamics and Control, 2020.
|
[30] |
C. K. Ling, F. Fang, and J. Z. Kolter, “What game are we playing? end-to-end learning in normal and extensive form games,” in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018.
|
[31] |
C. K. Ling, F. Fang, and J. Z. Kolter, “Large scale learning of agent rationality in two-player zero-sum games,” in Proc. AAAI Conf. Artificial Intelligence, Palo Alto, USA, 2019, pp. 6104–6111.
|
[32] |
Q. Berthet, M. Blondel, O. Teboul, M. Cuturi, J. P. Vert, and F. Bach, “Learning with differentiable perturbed optimizers,” arXiv preprint arXiv: 2002.08676, Jun. 2020.
|
[33] |
S. Barratt, G. Angeris, and S. Boyd, “Automatic repair of convex optimization problems,” Optim. Eng., vol. 22, pp. 247–259, 2021. doi: 10.1007/s11081-020-09508-9
|
[34] |
B. Amos and D. Yarats, “The differentiable cross-entropy method,” arXiv preprint arXiv: 1909.12830, Aug. 2020.
|
[35] |
S. Barratt and S. Boyd, “Least squares auto-tuning,” Eng. Optim., vol. 53, no. 5, pp. 789–810, May 2021. doi: 10.1080/0305215X.2020.1754406
|
[36] |
S. Barratt and R. Sharma, “Optimizing for generalization in machine learning with cross-validation gradients,” arXiv preprint arXiv: 1805.07072, May 2018.
|
[37] |
S. Diamond, V. Sitzmann, F. Heide, and G. Wetzstein, “Unrolled optimization with deep priors,” arXiv preprint arXiv: 1705.08041, Dec. 2018.
|
[38] |
J. Domke, “Generic methods for optimization-based modeling,” in Proc. 15th Int. Conf. Artificial Intelligence and Statistics, La Palma, Canary Islands, 2012, pp. 318–326.
|
[39] |
C. Finn, “Learning to learn with gradients,” Univ. California, Berkeley, USA, Tech. Rep. No. UCB/EECS-2018–105, 2018.
|
[40] |
D. Maclaurin, D. Duvenaud, and R. P. Adams, “Gradient-based hyperparameter optimization through reversible learning,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 2113–2122.
|
[41] |
A. Agrawal and S. Boyd, “Differentiating through log-log convex programs,” arXiv preprint arXiv: 2004.12553, May 2020.
|
[42] |
J. Lorraine, P. Vicol, and D. Duvenaud, “Optimizing millions of hyperparameters by implicit differentiation,” arXiv preprint arXiv: 1911.02590, Nov. 2019.
|
[43] |
N. Hansen and A. Ostermeier, “Completely derandomized self-adaptation in evolution strategies,” Evol. Comput., vol. 9, no. 2, pp. 159–195, Jun. 2001. doi: 10.1162/106365601750190398
|
[44] |
J. Močkus, “On Bayesian methods for seeking the extremum,” in Proc. IFIP Technical Conf. Optimization Techniques, Novosibirsk, Russia, 1974, pp. 400–404.
|
[45] |
F. J. Solis and R. J. B. Wets, “Minimization by random search techniques,” Math. Oper. Res., vol. 6, no. 1, pp. 19–30, Feb. 1981. doi: 10.1287/moor.6.1.19
|
[46] |
A. Beck and M. Teboulle, “Gradient-based algorithms with applications to signal-recovery,” in Convex Optimization in Signal Processing and Communications, D. P. Palomar and Y. C. Eldar, Eds. Cambridge, USA: Cambridge University Press, 2009, pp. 42–88.
|
[47] |
L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. COMPSTAT, Y. Lechevallier and G. Saporta, Eds. Heidelberg: Physica-Verlag, 2010, pp. 177–186.
|
[48] |
J. Nocedal and S. J. Wright, Numerical Optimization. New York, USA: Springer, 2006.
|
[49] |
C. M. Bishop, Pattern Recognition and Machine Learning. New York, USA: Springer, 2006.
|
[50] |
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, USA: MIT Press, 2016.
|
[51] |
R. E. Barlow and H. D. Brunk, “The isotonic regression problem and its dual,” J. Am. Stat. Assoc., vol. 41, no. 5, pp. 140–147, Mar. 1972.
|
[52] |
M. J. Best and N. Chakravarti, “Active set algorithms for isotonic regression; A unifying framework,” Math. Program., vol. 47, no. 1–3, pp. 425–439, May 1990. doi: 10.1007/BF01580873
|
[53] |
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Tokyo, Japan: Springer Science & Business Media, 2009.
|
[54] |
S. J. Kim, K. Koh, S. Boyd, and D. Gorinevsky, “
|
[55] |
D. Bertsekas, Dynamic Programming and Optimal Control, Vol. I. 4th ed. Belmont, USA: Athena Scientific, 2017.
|