### ?

## On Structured Prediction Theory with Calibrated Convex Surrogate Losses

P. 302-313.

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction.

### In book

Montreal: Curran Associates, 2017

Struminsky K., Simon Lacoste-Julien, Osokin A., , in: Advances in Neural Information Processing Systems 31 (NIPS 2018). .: [б.и.], 2018.. P. 1-9.

We study consistency properties of machine learning methods based on minimizing convex surrogates. We extend the recent framework of Osokin et al. (2017) for the quantitative analysis of consistency properties to the case of inconsistent surrogates. Our key technical contribution consists in a new lower bound on the calibration function for the quadratic surrogate, which ...

Added: October 29, 2018

Osokin A., Jean-Baptiste Alayrac, Isabella Lukasewitz et al., , in: Proceedings of Machine Learning Research. Proceedings of the International Conference on Machine Learning (ICML 2016). Vol. 48.: NY: [б.и.], 2016.. P. 885-925.

In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by ...

Added: October 19, 2017

[б.и.], 2019

Volume 99: Conference on Learning Theory, 25-28 June 2019, Phoenix, USA ...

Added: October 31, 2020

Dvurechensky P., Gasnikov A., Петр Остроухов et al., Near-optimal tensor methods for minimizing the gradient norm of convex function / . 2020.

Motivated by convex problems with linear constraints and, in particular, by entropy-regularized optimal transport, we consider the problem of finding ε-approximate stationary points, i.e. points with the norm of the objective gradient less than ε, of convex functions with Lipschitz p-th order derivatives. Lower complexity bounds for this problem were recently proposed in [Grapiglia and Nesterov, arXiv:1907.07053]. However, the ...

Added: October 25, 2020

Pietro Galliani, Amir Dezfouli, Edwin V. Bonilla et al., , in: Proceedings of Machine Learning Research. 2017. Volume 54: Artificial Intelligence and Statistics. Vol. 54: Artificial Intelligence and Statistics.: [б.и.], 2017.. P. 353-361.

We develop an automated variational inference method for Bayesian structured prediction problems with Gaussian process (GP) priors and linear-chain likelihoods. Our approach does not need to know the details of the structured likelihood model and can scale up to a large number of observations. Furthermore, we show that the required expected likelihood term and its ...

Added: December 10, 2018

Pavel Dvurechensky, Alexander Gasnikov, Evgenia Gasnikova et al., В кн.: Proceedings of DOOR 2016 Conference, special issue of CEUR Workshop Proceedings. Vol. 1623.: CEUR Workshop Proceedings, 2016.. С. 584-595.

In this paper, we consider a large class of hierarchical congestion population games. One can show that the equilibrium in a game of such type can be described as a minimum point in a properly constructed multi-level convex optimization problem. We propose a fast primal-dual composite gradient method and apply it to the problem, which ...

Added: November 17, 2017

Rémi Leblond, Jean-Baptiste Alayrac, Osokin A. et al., , in: Proceedings of the 6th International Conference on Learning Representations (ICLR 2018). .: [б.и.], 2018.. P. 1-16.

We propose SEARNN, a novel training algorithm for recurrent neural networks (RNNs) inspired by the "learning to search" (L2S) approach to structured prediction. RNNs have been widely successful in structured prediction applications such as machine translation or parsing, and are commonly trained using maximum likelihood estimation (MLE). Unfortunately, this training loss is not always an ...

Added: October 29, 2018

Dvurechensky P., Darina Dvinskikh, Gasnikov A. et al., , in: Advances in Neural Information Processing Systems 31 (NeurIPS 2018). .: Neural Information Processing Systems Foundation, 2018.. P. 10760-10770.

We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in ...

Added: October 31, 2020

Anton Anikin, Gasnikov A., Pavel Dvurechensky et al., Computational Mathematics and Mathematical Physics 2017 Vol. 57 No. 8 P. 1262-1276

A strongly convex function of simple structure (for example, separable) is minimized under affine constraints. A dual problem is constructed and solved by applying a fast gradient method. The necessary properties of this method are established relying on which, under rather general conditions, the solution of the primal problem can be recovered with the same ...

Added: November 29, 2018

Andrey Malinin, Mark Gales, , in: Proceedings of the 9th International Conference on Learning Representations (ICLR 2021). ICLR, 2021.. .: ICLR, 2021.. P. 1-31.

Added: November 1, 2021

Dvurechensky P., Gasnikov A., Evgenii Nurminski et al., , in: Numerical Nonsmooth Optimization. .: Springer, 2020.. P. 19-59.

This chapter is devoted to the blackbox subgradient algorithms with the minimal requirements for the storage of auxiliary results, which are necessary to execute these algorithms. To provide historical perspective this survey starts with the original result of Shor which opened this field with the application to the classical transportation problem. The theoretical complexity bounds ...

Added: October 31, 2020

Osokin A., Pushmeet Kohli, , in: Lecture Notes in Computer Science. Proceedings of the 13th European Conference on Computer Vision (ECCV 2014). * 2. Vol. 8690.: Zürich: Springer, 2014.. P. 663-678.

Interactive image segmentation is an important computer vision problem that has numerous real world applications. Models for image segmentation are generally trained to minimize the Hamming error in pixel labeling. The Hamming loss does not ensure that the topology/structure of the object being segmented is preserved and therefore is not a strong indicator of the ...

Added: October 19, 2017

Ivanova A., Федор Стонякин, Дмитрий Пасечнюк et al., Adaptive Mirror Descent for the Network Utility Maximization Problem / . 2019.

Network utility maximization is the most important problem in network traffic management. Given the growth of modern communication networks, we consider the utility maximization problem in a network with a large number of connections (links) that are used by a huge number of users. To solve this problem an adaptive mirror descent algorithm for many ...

Added: October 25, 2020

Tatiana Shpakova, Francis Bach, Osokin A., , in: Proceedings of the international conference on Uncertainty in Artificial Intelligence (UAI 2018). .: [б.и.], 2018.. P. 1-11.

We consider the structured-output prediction problem through probabilistic approaches and generalize the ``''perturb-and-MAP'' framework to more challenging weighted Hamming losses, which are crucial in applications. While in principle our approach is a straightforward marginalization, it requires solving many related MAP inference problems. We show that for log-supermodular pairwise models these operations can be performed efficiently ...

Added: October 29, 2018

Dvurechensky P., Petr Ostroukhov, Kamil Safin et al., , in: International Conference on Machine Learning (ICML 2020). Vol. 119.: PMLR, 2020..

Projection-free optimization via different variants of the Frank-Wolfe (FW), a.k.a. Conditional Gradient method has become one of the cornerstones in optimization for machine learning since in many cases the linear minimization oracle is much cheaper to implement than projections and some sparsity needs to be preserved. In a number of applications, e.g. Poisson inverse problems ...

Added: October 31, 2020

Protasov V., R. Jungers, Linear Algebra and its Applications 2013 Vol. 438 No. 11 P. 4448-4468

We introduce a new approach to evaluate the largest Lyapunov exponent of a family of nonnegative matrices. The method is based on using special positive homogeneous functionals on , which gives iterative lower and upper bounds for the Lyapunov exponent. They improve previously known bounds and converge to the real value. The rate of convergence ...

Added: February 23, 2016

Dvurechensky P., Gasnikov A., Journal of Optimization Theory and Applications 2016 Vol. 171 No. 1 P. 121-145

In this paper, we introduce new methods for convex optimization problems with stochastic inexact oracle. Our first method is an extension of the Intermediate Gradient Method proposed by Devolder, Glineur and Nesterov for problems with deterministic inexact oracle. Our method can be applied to problems with composite objective function, both deterministic and stochastic inexactness of ...

Added: October 31, 2020

Ivanova A., Gasnikov A., Dvurechensky P. et al., Oracle Complexity Separation in Convex Optimization / . 2020.

Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework ...

Added: October 25, 2020

Gasnikov A., Dvurechensky P., Gorbunov E. et al., , in: Conference on Learning Theory, 25-28 June 2019, Phoenix, USA. Vol. 99.: [б.и.], 2019.. P. 1374-1391.

We consider convex optimization problems with the objective function having Lipshitz-continuous p-th order derivative, where p≥1. We propose a new tensor method, which closes the gap between the lower Ω( ε^(−2/(3p+1)) and upper O( ε^(−1/p+1)) iteration complexity bounds for this class of optimization problems. We also consider uniformly convex functions, and show how the proposed ...

Added: October 31, 2020

NY: Springer, 2012

The volume is dedicated to Stephen Smale on the occasion of his 80th birthday. Besides his startling 1960 result of the proof of the Poincaré conjecture for all dimensions greater than or equal to five, Smale’s ground breaking contributions in various fields in Mathematics have marked the second part of the 20th century and beyond. ...

Added: December 19, 2012

Tuan-Hung Vu, Osokin A., Ivan Laptev, , in: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015). .: Santiago de Chile: IEEE, 2015.. P. 2893-2901.

Person detection is a key problem for many computer vision tasks. While face detection has reached maturity, detecting people under full variation of camera view-points, human poses, lighting conditions and occlusions is still a difficult challenge. In this work we focus on detecting human heads in natural scenes. Starting from the recent R-CNN object detector, ...

Added: October 19, 2017

Sergey Guminov, Nesterov Y., Dvurechensky P. et al., Doklady Mathematics 2019 Vol. 99 No. 2 P. 125-128

A new version of accelerated gradient descent is proposed. The method does not require any a priori information on the objective function, uses a linesearch procedure for convergence acceleration in practice, converge according to well-known lower bounds for both convex and nonconvex objective functions, and has primal-dual properties. A universal version of this method is ...

Added: October 31, 2020

Fedor Stonyakin, Darina Dvinskikh, Dvurechensky P. et al., , in: Mathematical Optimization Theory and Operations Research, 18th International Conference, MOTOR 2019 Ekaterinburg, Russia, July 8–12, 2019. Vol. 11548.: Springer, 2019.. P. 97-114.

We consider optimization methods for convex minimization problems under inexact information on the objective function. We introduce inexact model of the objective, which as a particular cases includes inexact oracle [16] and relative smoothness condition [36]. We analyze gradient method which uses this inexact model and obtain convergence rates for convex and strongly convex problems. ...

Added: October 31, 2020