Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108

Hanzely F.; Richtarik P.

?

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108

Issue 108. PMLR, 2020.

Eduard Gorbunov, Hanzely F., Richtarik P.

In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent (SGD) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities. We show that our framework includes methods with and without the following tricks, and their combinations: variance reduction, importance sampling, mini-batch sampling, quantization, and coordinate sub-sampling. As a by-product, we obtain the first unified theory of SGD and randomized coordinate descent (RCD) methods, the first unified theory of variance reduced and non-variance-reduced SGD methods, and the first unified theory of quantized and non-quantized methods. A key to our approach is a parametric assumption on the iterates and stochastic gradients. In a single theorem we establish a linear convergence result under this assumption and strong-quasi convexity of the loss function. Whenever we recover an existing method as a special case, our theorem gives the best known complexity result. Our approach can be used to motivate the development of new useful methods, and offers pre-proved convergence guarantees. To illustrate the strength of our approach, we develop five new variants of SGD, and through numerical experiments demonstrate some of their properties.

Deterministic Decoding for Discrete Data in Variational Autoencoders

Polykovskiy D., Vetrov D., , in: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108Issue 108. PMLR, 2020. P. 3046–3056.

Added: February 5, 2021

Research target: Computer Science

Language: English

Full text

Text on another site

Keywords: stochastic optimization convex optimization

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108

Vaidya’s method for convex stochastic optimization problems in small dimension

Gladin E., Gasnikov A., Ermakova E., Mathematical notes 2022 Vol. 112 No. 1 P. 183–190

The paper deals with a general problem of convex stochastic optimization in a space of small dimension (for example, 100 variables). It is known that for deterministic problems of convex optimization in small dimensions, the methods of centers of gravity type (for example, Vaidya’s method) provide the best convergence. For stochastic optimization problems, the question ...

Added: November 29, 2024

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

Dvinskikh D., Gasnikov A., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 3 P. 385–405

We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only ...

Added: October 29, 2021

Adaptive Mirror Descent for the Network Utility Maximization Problem

Ivanova A., Стонякин Ф., Пасечнюк Д. et al., / Series "Optimization and Control". 2019.

Network utility maximization is the most important problem in network traffic management. Given the growth of modern communication networks, we consider the utility maximization problem in a network with a large number of connections (links) that are used by a huge number of users. To solve this problem an adaptive mirror descent algorithm for many ...

Added: October 25, 2020

Oracle Complexity Separation in Convex Optimization

Ivanova A., Gasnikov A., Dvurechensky P. et al., / Series "Optimization and Control". 2020.

Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework ...

Added: October 25, 2020

Stochastic intermediate gradient method for convex problems with stochastic inexact oracle

Dvurechensky P., Gasnikov A., Journal of Optimization Theory and Applications 2016 Vol. 171 No. 1 P. 121–145

In this paper, we introduce new methods for convex optimization problems with stochastic inexact oracle. Our first method is an extension of the Intermediate Gradient Method proposed by Devolder, Glineur and Nesterov for problems with deterministic inexact oracle. Our method can be applied to problems with composite objective function, both deterministic and stochastic inexactness of ...

Added: October 31, 2020

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Gorbunov E., Danilova M., Shibaev I. et al., / Series arXiv:2106.05958 "arXiv:2106.05958". 2021.

Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it ...

Added: October 25, 2021

Обзор выпуклой оптимизации марковских процессов принятия решений

Rudenko V., Yudin N., Васин А. А., Компьютерные исследования и моделирование 2023 Т. 15 № 2 С. 329–353

This article reviews both historical achievements and modern results in the field of Markov Decision Process (MDP) and convex optimization. This review is the first attempt to cover the field of reinforcement learning in Russian in the context of convex optimization. The fundamental Bellman equation and the criteria of optimality of policy — strategies based on it, ...

Added: November 29, 2024

Accelerated and Unaccelerated Stochastic Gradient Descent in Model Generality

Dvinskikh D., Tyurin A., Gasnikov A. et al., Mathematical notes 2020 Vol. 108 No. 3-4 P. 511–522

A new method for deriving estimates of the rate of convergence of optimal methods for solving problems of smooth (strongly) convex stochastic optimization is described. The method is based on the results of stochastic optimization derived from results on the convergence of optimal methods under the conditions of inexact gradients with small noises of nonrandom ...

Added: February 5, 2021

Accuracy Certificates for Convex Minimization with Inexact Oracle

Gladin E., Gasnikov A., Dvurechensky P., Journal of Optimization Theory and Applications 2024 Article 1

Accuracy certificates for convex minimization problems allow for online verification of the accuracy of approximate solutions and provide a theoretically valid online stopping criterion. When solving the Lagrange dual problem, accuracy certificates produce a simple way to recover an approximate primal solution and estimate its accuracy. In this paper, we generalize accuracy certificates for the ...

Added: November 29, 2024

An accelerated directional derivative method for smooth stochastic convex optimization

Dvurechensky P., Eduard Gorbunov, Gasnikov A., European Journal of Operational Research 2021 Vol. 290 No. 2 P. 601–621

We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivative-free optimization and gradient-based optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional ...

Added: September 25, 2020

Near-optimal tensor methods for minimizing the gradient norm of convex function

Dvurechensky P., Gasnikov A., Остроухов П. et al., / Series "Optimization and Control". 2020.

Motivated by convex problems with linear constraints and, in particular, by entropy-regularized optimal transport, we consider the problem of finding ε-approximate stationary points, i.e. points with the norm of the objective gradient less than ε, of convex functions with Lipschitz p-th order derivatives. Lower complexity bounds for this problem were recently proposed in [Grapiglia and Nesterov, arXiv:1907.07053]. However, the ...

Added: October 25, 2020

Метод эллипсоидов для задач выпуклой стохастической оптимизации малой размерности

Gladin E., Зайнуллина К. Э., Компьютерные исследования и моделирование 2021 Т. 13 № 6 С. 1137–1147

The article considers minimization of the expectation of convex function. Problems of this type often arise in machine learning and a variety of other applications. In practice, stochastic gradient descent (SGD) and similar procedures are usually used to solve such problems. We propose to use the ellipsoid method with mini-batching, which converges linearly and can ...

Added: November 29, 2024

Universal intermediate gradient method for convex problems with inexact oracle

Kamzolov D., Dvurechensky P., Gasnikov A., Optimization Methods and Software 2021 Vol. 36 No. 6 P. 1289–1316

In this paper, we propose new first-order methods for minimization of a convex function on a simple convex set. We assume that the objective function is a composite function given as a sum of a simple convex function and a convex function with inexact Hölder-continuous subgradient. We propose Universal Intermediate Gradient Method. Our method enjoys ...

Added: August 4, 2020

A randomized coordinate descent method with volume sampling

Rodomanov A., Kropotov D., SIAM Journal on Optimization 2020 Vol. 30 No. 3 P. 1878–1904

We analyze the coordinate descent method with a new coordinate selection strategy, called volume sampling. This strategy prescribes selecting subsets of variables of certain size proportionally to the determinants of principal submatrices of the matrix, which bounds the curvature of the objective function. In the particular case when the size of the subsets equals one, ...

Added: July 29, 2020

Optimization of the fluid model of scheduling: local predictions

Bogachev T., / Series math "arxiv.org". 2022.

In this research a continuous model for resource allocations in a queuing system is considered and a local prediction on the system behavior is developed. As a result we obtain a set of possible cases, some of which lead to quite clear optimization problems. Currently, the main result of this research direction is an algorithm ...

Added: October 21, 2022

Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator

Vorontsova E., Gasnikov A., Dvurechensky P. et al., Automation and Remote Control 2019 Vol. 80 No. 8 P. 1487–1501

We propose an accelerated gradient-free method with a non-Euclidean proximal operator associated with the p-norm (1 ⩽ p ⩽ 2). We obtain estimates for the rate of convergence of the method under low noise arising in the calculation of the function value. We present the results of computational experiments. ...

Added: December 10, 2019

Accelerated zeroth-order method for non-smooth stochastic convex optimization problem with infinite variance

Kornilov N., Shamir O., Lobanov A. et al., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023.

In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in (Gasnikov et. al., 2022). This optimality is defined in terms of iteration and oracle ...

Added: March 26, 2024

Solving Convex Min-Min Problems with Smoothness and Strong Convexity in One Group of Variables and Low Dimension in the Other

Gladin E., Alkousa M., Gasnikov A., Automation and Remote Control 2021 Vol. 82 P. 1679–1691

The article deals with some approaches to solving convex problems of the min-min type with smoothness and strong convexity in only one of the two groups of variables. It is shown that the proposed approaches based on Vaidya’s method, the fast gradient method, and the accelerated gradient method with variance reduction have linear convergence. It ...

Added: November 29, 2024

Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems

Guminov S., Nesterov Y., Dvurechensky P. et al., Doklady Mathematics 2019 Vol. 99 No. 2 P. 125–128

A new version of accelerated gradient descent is proposed. The method does not require any a priori information on the objective function, uses a linesearch procedure for convergence acceleration in practice, converge according to well-known lower bounds for both convex and nonconvex objective functions, and has primal-dual properties. A universal version of this method is ...

Added: October 31, 2020

Stochastic saddle-point optimization for the Wasserstein barycenter problem

Tiapkin D., Gasnikov A., Dvurechensky P., Optimization Letters 2022 Vol. 16 No. 7 P. 2145–2175

We consider the population Wasserstein barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem. We ...

Added: October 16, 2022

Proceedings of Machine Learning Research

Kovalev D., Shulgin E., Richtarik P. et al., PMLR, 2021.

We propose ADOM – an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks. ADOM uses a dual oracle, i.e., we assume access to the gradient of the Fenchel conjugate of the individual loss functions. Up to a constant factor, which depends on the network structure only, its communication complexity is the ...

Added: October 31, 2021

Linearly Converging Error Compensated SGD

Eduard Gorbunov, Kovalev D., Makarenko D. et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020). Curran Associates, Inc., 2020. P. 20889–20900.

Added: December 7, 2020

Alternating minimization methods for strongly convex optimization

Tupitsa N., Dvurechensky P., Gasnikov A. et al., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 5 P. 721–739

We consider alternating minimization procedures for convex and non-convex optimization problems with the vector of variables divided into several blocks, each block being amenable for minimization with respect to its variables while maintaining other variables' blocks constant. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under the ...

Added: September 29, 2021

КАРТИРОВАНИЕ НЕДОСТУПНЫХ ЗДАНИЙ МЕТОДОМ РАДИОТОМОГРАФИИ

Ingacheva A., Кохан В. В., Ershov E. et al., Сенсорные системы 2018 Т. 32 № 4 С. 332–341

In this paper we consider the task of inner objects mapping for the building with a bunch of moving around it autonomous agents which use narrow beam of radio waves using WiFi frequency (2.4 GHz). Linear model of pixel-wise radio waves attenuation is considered. SIRT algorithm with TV and Tikhonov regularizations is used for the ...

Added: February 9, 2020