### Article

## Dual Approaches to the Minimization of Strongly Convex Functionals with a Simple Structure under Affine Constraints

A strongly convex function of simple structure (for example, separable) is minimized under affine constraints. A dual problem is constructed and solved by applying a fast gradient method. The necessary properties of this method are established relying on which, under rather general conditions, the solution of the primal problem can be recovered with the same accuracy as the dual solution from the sequence generated by this method in the dual space of the problem. Although this approach seems natural, some previously unpublished rather subtle results necessary for its rigorous and complete theoretical substantiation in the required generality are presented.

In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an *adaptive* criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gap-based sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets.

We introduce a new approach to evaluate the largest Lyapunov exponent of a family of nonnegative matrices. The method is based on using special positive homogeneous functionals on , which gives iterative lower and upper bounds for the Lyapunov exponent. They improve previously known bounds and converge to the real value. The rate of convergence is estimated and the efficiency of the algorithm is demonstrated on several problems from applications (in functional analysis, combinatorics, and language theory) and on numerical examples with randomly generated matrices. The method computes the Lyapunov exponent with a prescribed accuracy in relatively high dimensions (up to 60). We generalize this approach to all matrices, not necessarily nonnegative, derive a new universal upper bound for the Lyapunov exponent, and show that a potential similar lower bound does not exist in general.

Volume 99: Conference on Learning Theory, 25-28 June 2019, Phoenix, USA

A new version of accelerated gradient descent is proposed. The method does not require any a priori information on the objective function, uses a linesearch procedure for convergence acceleration in practice, converge according to well-known lower bounds for both convex and nonconvex objective functions, and has primal-dual properties. A universal version of this method is also described.

In this paper, we consider a large class of hierarchical congestion population games. One can show that the equilibrium in a game of such type can be described as a minimum point in a properly constructed multi-level convex optimization problem. We propose a fast primal-dual composite gradient method and apply it to the problem, which is dual to the problem describing the equilibrium in the considered class of games. We prove that this method allows to find an approximate solution of the initial problem without increasing the complexity.

We provide novel theoretical insights on structured prediction in the context of efficient convex surrogate loss minimization with consistency guarantees. For any task loss, we construct a convex surrogate that can be optimized via stochastic gradient descent and we prove tight bounds on the so-called "calibration function" relating the excess surrogate risk to the actual risk. In contrast to prior related work, we carefully monitor the effect of the exponential number of classes in the learning guarantees as well as on the optimization complexity. As an interesting consequence, we formalize the intuition that some task losses make learning harder than others, and that the classical 0-1 loss is ill-suited for structured prediction.

In this paper, we introduce new methods for convex optimization problems with stochastic inexact oracle. Our first method is an extension of the Intermediate Gradient Method proposed by Devolder, Glineur and Nesterov for problems with deterministic inexact oracle. Our method can be applied to problems with composite objective function, both deterministic and stochastic inexactness of the oracle, and allows using a non-Euclidean setup. We estimate the rate of convergence in terms of the expectation of the non-optimality gap and provide a way to control the probability of large deviations from this rate. Also we introduce two modifications of this method for strongly convex problems. For the first modification, we estimate the rate of convergence for the non-optimality gap expectation and, for the second, we provide a bound for the probability of large deviations from the rate of convergence in terms of the expectation of the non-optimality gap. All the rates lead to the complexity estimates for the proposed methods, which up to a multiplicative constant coincide with the lower complexity bound for the considered class of convex composite optimization problems with stochastic inexact oracle.