Noisy Zeroth-Order Optimization for Non-smooth Saddle Point Problems

D. Dvinskikh; Tominin V.; Tominin I.; Gasnikov Alexander

doi:10.1007/978-3-031-09607-5_2

Publications

?

Noisy Zeroth-Order Optimization for Non-smooth Saddle Point Problems

Ch. 279899. P. 18–33.

Dvinskikh D., Tominin V., Tominin I., Gasnikov Alexander

Language: English

DOI

Text on another site

Keywords: stochastic optimization Saddle-point problems non-smooth optimization gradient-free optimization

In book

Mathematical Optimization Theory and Operations Research, 21st International Conference, MOTOR 2022, Petrozavodsk, Russia, July 2–6, 2022, Proceedings

Vol. 13367. , Springer, 2022.

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

Kornilov N., Gasnikov A., Dvurechensky P. et al., Computational Management Science 2023 Article 37

We present two easy-to-implement gradient-free/zeroth-order methods to optimize a stochastic non-smooth function accessible only via a black-box. The methods are built upon efficient first-order methods in the heavy-tailed case, i.e., when the gradi- ent noise has infinite variance but bounded (1 + 𝜅)-th moment for some 𝜅 ∈ (0, 1]. The first algorithm is based ...

Added: February 7, 2025

Fast gradient-free activation maximization for neurons in spiking neural networks

Pospelov N., Chertkov A., Beketov M. et al., Neurocomputing 2025 Vol. 618 Article 129070

Elements of neural networks, both biological and artificial, can be described by their selectivity for specific cognitive features. Understanding these features is important for understanding the inner workings of neural networks. For a living system, such as a neuron, whose response to a stimulus is unknown and not differentiable, the only way to reveal these ...

Added: December 14, 2024

Vaidya’s method for convex stochastic optimization problems in small dimension

Gladin E., Gasnikov A., Ermakova E., Mathematical notes 2022 Vol. 112 No. 1 P. 183–190

The paper deals with a general problem of convex stochastic optimization in a space of small dimension (for example, 100 variables). It is known that for deterministic problems of convex optimization in small dimensions, the methods of centers of gravity type (for example, Vaidya’s method) provide the best convergence. For stochastic optimization problems, the question ...

Added: November 29, 2024

Метод эллипсоидов для задач выпуклой стохастической оптимизации малой размерности

Gladin E., Зайнуллина К. Э., Компьютерные исследования и моделирование 2021 Т. 13 № 6 С. 1137–1147

The article considers minimization of the expectation of convex function. Problems of this type often arise in machine learning and a variety of other applications. In practice, stochastic gradient descent (SGD) and similar procedures are usually used to solve such problems. We propose to use the ellipsoid method with mini-batching, which converges linearly and can ...

Added: November 29, 2024

Gradient-free Federated Learning Methods with l1 and l2-randomization for Non-smooth Convex Stochastic Optimization Problems

Alashqar B., Gasnikov A., Dvinskikh D. et al., Computational Mathematics and Mathematical Physics 2023 Vol. 63 P. 1600–1653

This paper studies non-smooth problems of convex stochastic optimization. Using the smoothing technique based on the replacement of the function value at the considered point by the averaged function value over a ball (in l1-norm or l2-norm) of a small radius centered at this point, and then the original problem is reduced to a smooth problem (whose ...

Added: March 27, 2024

Accelerated zeroth-order method for non-smooth stochastic convex optimization problem with infinite variance

Kornilov N., Shamir O., Lobanov A. et al., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023).: Curran Associates, Inc., 2023. P. 64083–64102.

Added: March 26, 2024

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

Schechtman S., Tiapkin D., Muehlebach M. et al., , in: Proceedings of Machine Learning Research: Volume 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, IndiaVol. 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India.: PMLR, 2023. P. 1228–1258.

We consider the problem of minimizing a non-convex function over a smooth manifold M. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM), which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards M. ...

Added: December 1, 2023

One-Point Gradient-Free Methods for Smooth and Non-smooth Saddle-Point Problems

Beznosikov A., Novitskii V., Gasnikov A., , in: Mathematical Optimization Theory and Operations Research: 20th International Conference, MOTOR 2021, Irkutsk, Russia, July 5–10, 2021, Proceedings.: Cham: Springer, 2021. Ch. 261179 P. 144–158.

In this paper, we analyze gradient-free methods with one-point feedback for stochastic saddle point problems min xmax yφ(x, y). For non-smooth and smooth cases, we present an analysis in a general geometric setup with the arbitrary Bregman divergence. For problems with higher order smoothness, the analysis is carried out only in the Euclidean case. The estimates we have obtained repeat the best currently known estimates of gradient-free ...

Added: October 30, 2022

Primal-Dual Stochastic Mirror Descent for MDPs

Tiapkin D., Alexander Gasnikov, , in: International Conference on Artificial Intelligence and Statistics, 28-30 March 2022, A Virtual ConferenceVol. 151: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.: PMLR, 2022. P. 9723–9740.

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze ...

Added: October 16, 2022

First-Order Constrained Optimization: Non-smooth Dynamical System Viewpoint

Schechtman S., Tiapkin D., Moulines E. et al., IFAC-PapersOnLine 2022 Vol. 55 No. 16 P. 236–241

In a recent paper, Muehlebach and Jordan (2021a) proposed a novel algorithm for constrained optimization that uses original ideals from nonsmooth dynamical systems. In this work, we extend Muehlebach and Jordan (2021a) in several important directions: (i) we provide existence and convergence results for continuous-time trajectories under general conditions, and (ii) we provide a convergence ...

Added: October 16, 2022

Stochastic saddle-point optimization for the Wasserstein barycenter problem

Tiapkin D., Gasnikov A., Dvurechensky P., Optimization Letters 2022 Vol. 16 No. 7 P. 2145–2175

We consider the population Wasserstein barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem. We ...

Added: October 16, 2022

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

Dvinskikh D., Gasnikov A., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 3 P. 385–405

We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only ...

Added: October 29, 2021

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Gorbunov E., Danilova M., Shibaev I. et al., / Series arXiv:2106.05958 "arXiv:2106.05958". 2021.

Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it ...

Added: October 25, 2021

Zeroth-Order Algorithms for Smooth Saddle-Point Problems

Sadiev A., Beznosikov A., Dvurechensky P. et al., Communications in Computer and Information Science 2021 Vol. 1476 P. 71–85

Saddle-point problems have recently gained an increased attention from the machine learning community, mainly due to applications in training Generative Adversarial Networks using stochastic gradients. At the same time, in some applications only a zeroth-order oracle is available. In this paper, we propose several algorithms to solve stochastic smooth (strongly) convex-concave saddle-point problems using zeroth-order ...

Added: October 14, 2021

Accelerated and Unaccelerated Stochastic Gradient Descent in Model Generality

Dvinskikh D., Tyurin A., Gasnikov A. et al., Mathematical notes 2020 Vol. 108 No. 3-4 P. 511–522

A new method for deriving estimates of the rate of convergence of optimal methods for solving problems of smooth (strongly) convex stochastic optimization is described. The method is based on the results of stochastic optimization derived from results on the convergence of optimal methods under the conditions of inexact gradients with small noises of nonrandom ...

Added: February 5, 2021

Linearly Converging Error Compensated SGD

Eduard Gorbunov, Kovalev D., Makarenko D. et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 20889–20900.

Added: December 7, 2020

Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

Gorbunov E., Danilova M., Gasnikov A., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 15042–15053.

Added: December 7, 2020

Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, PMLR 108

Eduard Gorbunov, Hanzely F., Richtarik P., PMLR, 2020.

In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent (SGD) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities. We show that our framework includes methods with and without the following tricks, and ...

Added: December 7, 2020

Low-Variance Black-Box Gradient Estimates for the Plackett-Luce Distribution

Gadetsky A., Struminsky K., Robinson C. et al., , in: Thirty-Fourth AAAI Conference on Artificial IntelligenceVol. 34.: AAAI Press, 2020. P. 10126–10135.

Added: October 11, 2020

A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums

Rodomanov A., Kropotov D., , in: Proceedings of Machine Learning Research. Proceedings of the International Conference on Machine Learning (ICML 2016)Vol. 48.: NY: [б.и.], 2016. P. 2597–2605.

We consider the problem of minimizing the strongly convex sum of a finite number of convex functions. Standard algorithms for solving this problem in the class of incremental/stochastic methods have at most a linear convergence rate. We propose a new incremental method whose convergence rate is superlinear – the Newton-type incremental method (NIM). The idea ...

Added: December 10, 2018

A Superlinearly-Convergent Proximal Newton-Type Method for the Optimization of Finite Sums

Rodomanov A., Kropotov D., Journal of Machine Learning Research 2016 Vol. 48 P. 2597–2605

We consider the problem of optimizing the strongly convex sum of a finite number of convex functions. Standard algorithms for solving this problem in the class of incremental/stochastic methods have at most a linear convergence rate. We propose a new incremental method whose convergence rate is superlinear – the Newton-type incremental method (NIM). The idea ...

Added: March 11, 2017