Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

E. Gorbunov; Danilova M.; A. Gasnikov

Publications

?

Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

P. 15042–15053.

Gorbunov E., Danilova M., Gasnikov A.

Language: English

Full text

Text on another site

Keywords: heavy tails stochastic optimization convex optimization

In book

Advances in Neural Information Processing Systems 33 (NeurIPS 2020)

Curran Associates, Inc., 2020.

On Linear Convergence in Smooth Convex-Concave Bilinearly-Coupled Saddle-Point Optimization: Lower Bounds and Optimal Algorithms

Borodich E., Gasnikov A., Kovalev D., , in: Volume 267: International Conference on Machine Learning, 13-19 July 2025, Vancouver Convention Center, Vancouver, CanadaVol. 267.: [б.и.], 2025. P. 5045–5100.

Added: November 18, 2025

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

Kornilov N., Gasnikov A., Dvurechensky P. et al., Computational Management Science 2023 Article 37

We present two easy-to-implement gradient-free/zeroth-order methods to optimize a stochastic non-smooth function accessible only via a black-box. The methods are built upon efficient first-order methods in the heavy-tailed case, i.e., when the gradi- ent noise has infinite variance but bounded (1 + 𝜅)-th moment for some 𝜅 ∈ (0, 1]. The first algorithm is based ...

Added: February 7, 2025

Solving Convex Min-Min Problems with Smoothness and Strong Convexity in One Group of Variables and Low Dimension in the Other

Gladin E., Alkousa M., Gasnikov A., Automation and Remote Control 2021 Vol. 82 P. 1679–1691

The article deals with some approaches to solving convex problems of the min-min type with smoothness and strong convexity in only one of the two groups of variables. It is shown that the proposed approaches based on Vaidya’s method, the fast gradient method, and the accelerated gradient method with variance reduction have linear convergence. It ...

Added: November 29, 2024

Vaidya’s method for convex stochastic optimization problems in small dimension

Gladin E., Gasnikov A., Ermakova E., Mathematical notes 2022 Vol. 112 No. 1 P. 183–190

The paper deals with a general problem of convex stochastic optimization in a space of small dimension (for example, 100 variables). It is known that for deterministic problems of convex optimization in small dimensions, the methods of centers of gravity type (for example, Vaidya’s method) provide the best convergence. For stochastic optimization problems, the question ...

Added: November 29, 2024

Accuracy Certificates for Convex Minimization with Inexact Oracle

Gladin E., Gasnikov A., Dvurechensky P., Journal of Optimization Theory and Applications 2025 Vol. 204 No. 1 Article 1

Accuracy certificates for convex minimization problems allow for online verification of the accuracy of approximate solutions and provide a theoretically valid online stopping criterion. When solving the Lagrange dual problem, accuracy certificates produce a simple way to recover an approximate primal solution and estimate its accuracy. In this paper, we generalize accuracy certificates for the ...

Added: November 29, 2024

Метод эллипсоидов для задач выпуклой стохастической оптимизации малой размерности

Gladin E., Зайнуллина К. Э., Компьютерные исследования и моделирование 2021 Т. 13 № 6 С. 1137–1147

The article considers minimization of the expectation of convex function. Problems of this type often arise in machine learning and a variety of other applications. In practice, stochastic gradient descent (SGD) and similar procedures are usually used to solve such problems. We propose to use the ellipsoid method with mini-batching, which converges linearly and can ...

Added: November 29, 2024

Обзор выпуклой оптимизации марковских процессов принятия решений

Rudenko V., Yudin N., Васин А. А., Компьютерные исследования и моделирование 2023 Т. 15 № 2 С. 329–353

This article reviews both historical achievements and modern results in the field of Markov Decision Process (MDP) and convex optimization. This review is the first attempt to cover the field of reinforcement learning in Russian in the context of convex optimization. The fundamental Bellman equation and the criteria of optimality of policy — strategies based on it, ...

Added: November 29, 2024

Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems

Puchkin N., Gorbunov E., Kutuzov N. et al., , in: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), 2-4 May 2024, Palau de Congressos, Valencia, Spain. PMLR: Volume 238Vol. 238.: Valencia: PMLR, 2024. P. 856–864.

We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than 𝑂(𝐾^{−2(𝛼−1)/𝛼}), when the stochastic gradients have finite 𝛼-th moment, 𝛼∈(1,2]. In particular, our analysis allows the noise norm to have an unbounded expectation. To achieve these results, we stabilize stochastic gradients, ...

Added: April 22, 2024

Gradient-free Federated Learning Methods with l1 and l2-randomization for Non-smooth Convex Stochastic Optimization Problems

Alashqar B., Gasnikov A., Dvinskikh D. et al., Computational Mathematics and Mathematical Physics 2023 Vol. 63 P. 1600–1653

This paper studies non-smooth problems of convex stochastic optimization. Using the smoothing technique based on the replacement of the function value at the considered point by the averaged function value over a ball (in l1-norm or l2-norm) of a small radius centered at this point, and then the original problem is reduced to a smooth problem (whose ...

Added: March 27, 2024

Accelerated zeroth-order method for non-smooth stochastic convex optimization problem with infinite variance

Kornilov N., Shamir O., Lobanov A. et al., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023).: Curran Associates, Inc., 2023. P. 64083–64102.

Added: March 26, 2024

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

Schechtman S., Tiapkin D., Muehlebach M. et al., , in: Proceedings of Machine Learning Research: Volume 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, IndiaVol. 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India.: PMLR, 2023. P. 1228–1258.

We consider the problem of minimizing a non-convex function over a smooth manifold M. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM), which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards M. ...

Added: December 1, 2023

Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

Beznosikov A., Richtarik P., Diskin M. et al., , in: Thirty-Sixth Conference on Neural Information Processing Systems : NeurIPS 2022.: Curran Associates, Inc., 2022. P. 14013–14029.

Added: January 27, 2023

On a Combination of Alternating Minimization and Nesterov’s Momentum

Guminov S., Dvurechensky P., Tupitsa N. et al., , in: Proceedings of the 38th International Conference on Machine Learning (ICML 2021)Vol. 139.: PMLR, 2021. P. 3886–3898.

Added: October 30, 2022

One-Point Gradient-Free Methods for Smooth and Non-smooth Saddle-Point Problems

Beznosikov A., Novitskii V., Gasnikov A., , in: Mathematical Optimization Theory and Operations Research: 20th International Conference, MOTOR 2021, Irkutsk, Russia, July 5–10, 2021, Proceedings.: Cham: Springer, 2021. Ch. 261179 P. 144–158.

In this paper, we analyze gradient-free methods with one-point feedback for stochastic saddle point problems min xmax yφ(x, y). For non-smooth and smooth cases, we present an analysis in a general geometric setup with the arbitrary Bregman divergence. For problems with higher order smoothness, the analysis is carried out only in the Euclidean case. The estimates we have obtained repeat the best currently known estimates of gradient-free ...

Added: October 30, 2022

Oracle Complexity Separation in Convex Optimization

Ivanova A., Dvurechensky P., Vorontsova E. et al., Journal of Optimization Theory and Applications 2022 Vol. 193 No. 1-3 P. 462–490

Many convex optimization problems have structured objective functions written as a sum of functions with different oracle types (e.g., full gradient, coordinate derivative, stochastic gradient) and different arithmetic operations complexity of these oracles. In the strongly convex case, these functions also have different condition numbers that eventually define the iteration complexity of first-order methods and ...

Added: October 28, 2022

Noisy Zeroth-Order Optimization for Non-smooth Saddle Point Problems

Dvinskikh D., Tominin V., Tominin I. et al., , in: Mathematical Optimization Theory and Operations Research, 21st International Conference, MOTOR 2022, Petrozavodsk, Russia, July 2–6, 2022, ProceedingsVol. 13367.: Springer, 2022. Ch. 279899 P. 18–33.

Added: October 28, 2022