Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

Schechtman S.; D. Tiapkin; Muehlebach M.; E. Moulines

?

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

P. 1228–1258.

Schechtman S., Tiapkin D., Muehlebach M., Moulines E.

We consider the problem of minimizing a non-convex function over a smooth manifold M. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM), which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards M. ODCGM is much simpler to implement than the classical methods which require the computation of a retraction. Moreover, we show that ODCGM exhibits the near-optimal oracle complexities O(1/ε^{-2}) and O(1/ε^{-4}) in the deterministic and stochastic cases, respectively. Furthermore, we establish that, under an appropriate choice of the projection metric, our method recovers the landing algorithm of Ablin and Peyré (2022), a recently introduced algorithm for optimization over the Stiefel manifold. As a result, we significantly extend the analysis of Ablin and Peyré (2022), establishingnear-optimal rates both in deterministic and stochastic frameworks. Finally, we perform numerical experiments, which shows the efficiency of ODCGM in a high-dimensional setting.

Language: English

Full text

Text on another site

Keywords: constrained optimization stochastic optimization non-convex optimization

Publication based on the results of:

Structural learning and its applications (2023)

In book

Proceedings of Machine Learning Research: Volume 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India

Vol. 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India. , PMLR, 2023.

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

Kornilov N., Gasnikov A., Dvurechensky P. et al., Computational Management Science 2023 Article 37

We present two easy-to-implement gradient-free/zeroth-order methods to optimize a stochastic non-smooth function accessible only via a black-box. The methods are built upon efficient first-order methods in the heavy-tailed case, i.e., when the gradi- ent noise has infinite variance but bounded (1 + 𝜅)-th moment for some 𝜅 ∈ (0, 1]. The first algorithm is based ...

Added: February 7, 2025

Регуляризация и ускорение метода Гаусса–Ньютона

Yudin N., Gasnikov A., Компьютерные исследования и моделирование 2024 Т. 16 № 7 С. 1829–1840

We propose a family of Gauss–Newton methods for solving optimization problems and systems of nonlinear equations based on the ideas of using the upper estimate of the norm of the residual of the system of nonlinear equations and quadratic regularization. The paper presents a development of the «Three Squares Method» scheme with the addition of ...

Added: December 29, 2024

Vaidya’s method for convex stochastic optimization problems in small dimension

Gladin E., Gasnikov A., Ermakova E., Mathematical notes 2022 Vol. 112 No. 1 P. 183–190

The paper deals with a general problem of convex stochastic optimization in a space of small dimension (for example, 100 variables). It is known that for deterministic problems of convex optimization in small dimensions, the methods of centers of gravity type (for example, Vaidya’s method) provide the best convergence. For stochastic optimization problems, the question ...

Added: November 29, 2024

Метод эллипсоидов для задач выпуклой стохастической оптимизации малой размерности

Gladin E., Зайнуллина К. Э., Компьютерные исследования и моделирование 2021 Т. 13 № 6 С. 1137–1147

The article considers minimization of the expectation of convex function. Problems of this type often arise in machine learning and a variety of other applications. In practice, stochastic gradient descent (SGD) and similar procedures are usually used to solve such problems. We propose to use the ellipsoid method with mini-batching, which converges linearly and can ...

Added: November 29, 2024

Algorithm for Constrained Markov Decision Process with Linear Convergence

Gladin E., Lavrik-Karmazin M., Zainullina K. et al., Proceedings of Machine Learning Research 2023 Vol. 206 P. 11506–11533

The problem of constrained Markov decision process is considered. An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs (the number of constraints is relatively small). A new dual approach is proposed with the integration of two ingredients: entropy-regularized policy optimizer and Vaidya’s dual optimizer, both of which ...

Added: November 6, 2024

Gradient-free Federated Learning Methods with l1 and l2-randomization for Non-smooth Convex Stochastic Optimization Problems

Alashqar B., Gasnikov A., Dvinskikh D. et al., Computational Mathematics and Mathematical Physics 2023 Vol. 63 P. 1600–1653

This paper studies non-smooth problems of convex stochastic optimization. Using the smoothing technique based on the replacement of the function value at the considered point by the averaged function value over a ball (in l1-norm or l2-norm) of a small radius centered at this point, and then the original problem is reduced to a smooth problem (whose ...

Added: March 27, 2024

Accelerated zeroth-order method for non-smooth stochastic convex optimization problem with infinite variance

Kornilov N., Shamir O., Lobanov A. et al., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023).: Curran Associates, Inc., 2023. P. 64083–64102.

Added: March 26, 2024

One-Point Gradient-Free Methods for Smooth and Non-smooth Saddle-Point Problems

Beznosikov A., Novitskii V., Gasnikov A., , in: Mathematical Optimization Theory and Operations Research: 20th International Conference, MOTOR 2021, Irkutsk, Russia, July 5–10, 2021, Proceedings.: Cham: Springer, 2021. Ch. 261179 P. 144–158.

In this paper, we analyze gradient-free methods with one-point feedback for stochastic saddle point problems min xmax yφ(x, y). For non-smooth and smooth cases, we present an analysis in a general geometric setup with the arbitrary Bregman divergence. For problems with higher order smoothness, the analysis is carried out only in the Euclidean case. The estimates we have obtained repeat the best currently known estimates of gradient-free ...

Added: October 30, 2022

Noisy Zeroth-Order Optimization for Non-smooth Saddle Point Problems

Dvinskikh D., Tominin V., Tominin I. et al., , in: Mathematical Optimization Theory and Operations Research, 21st International Conference, MOTOR 2022, Petrozavodsk, Russia, July 2–6, 2022, ProceedingsVol. 13367.: Springer, 2022. Ch. 279899 P. 18–33.

Added: October 28, 2022

Primal-Dual Stochastic Mirror Descent for MDPs

Tiapkin D., Alexander Gasnikov, , in: International Conference on Artificial Intelligence and Statistics, 28-30 March 2022, A Virtual ConferenceVol. 151: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.: PMLR, 2022. P. 9723–9740.

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze ...

Added: October 16, 2022

First-Order Constrained Optimization: Non-smooth Dynamical System Viewpoint

Schechtman S., Tiapkin D., Moulines E. et al., IFAC-PapersOnLine 2022 Vol. 55 No. 16 P. 236–241

In a recent paper, Muehlebach and Jordan (2021a) proposed a novel algorithm for constrained optimization that uses original ideals from nonsmooth dynamical systems. In this work, we extend Muehlebach and Jordan (2021a) in several important directions: (i) we provide existence and convergence results for continuous-time trajectories under general conditions, and (ii) we provide a convergence ...

Added: October 16, 2022

Stochastic saddle-point optimization for the Wasserstein barycenter problem

Tiapkin D., Gasnikov A., Dvurechensky P., Optimization Letters 2022 Vol. 16 No. 7 P. 2145–2175

We consider the population Wasserstein barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem. We ...

Added: October 16, 2022

Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems

Dvinskikh D., Gasnikov A., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 3 P. 385–405

We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only ...

Added: October 29, 2021

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Gorbunov E., Danilova M., Shibaev I. et al., / Series arXiv:2106.05958 "arXiv:2106.05958". 2021.

Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it ...

Added: October 25, 2021

MARINA: Faster Non-Convex Distributed Learning with Compression

Gorbunov E., Burlachenko K., Li Z. et al., , in: Proceedings of the 38th International Conference on Machine Learning (ICML 2021)Vol. 139.: PMLR, 2021. Ch. 139 P. 3788–3798.

Added: October 25, 2021

Recent Theoretical Advances in Non-Convex Optimization

Danilova M., Dvurechensky P., Gasnikov A. et al., / Series arXiv:2012.06188 "arXiv:2012.06188". 2020.

Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex problems could not ...

Added: October 25, 2021

Zeroth-Order Algorithms for Smooth Saddle-Point Problems

Sadiev A., Beznosikov A., Dvurechensky P. et al., Communications in Computer and Information Science 2021 Vol. 1476 P. 71–85

Saddle-point problems have recently gained an increased attention from the machine learning community, mainly due to applications in training Generative Adversarial Networks using stochastic gradients. At the same time, in some applications only a zeroth-order oracle is available. In this paper, we propose several algorithms to solve stochastic smooth (strongly) convex-concave saddle-point problems using zeroth-order ...

Added: October 14, 2021

Alternating minimization methods for strongly convex optimization

Tupitsa N., Dvurechensky P., Gasnikov A. et al., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 5 P. 721–739

We consider alternating minimization procedures for convex and non-convex optimization problems with the vector of variables divided into several blocks, each block being amenable for minimization with respect to its variables while maintaining other variables' blocks constant. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under the ...

Added: September 29, 2021

Accelerated and Unaccelerated Stochastic Gradient Descent in Model Generality

Dvinskikh D., Tyurin A., Gasnikov A. et al., Mathematical notes 2020 Vol. 108 No. 3-4 P. 511–522

A new method for deriving estimates of the rate of convergence of optimal methods for solving problems of smooth (strongly) convex stochastic optimization is described. The method is based on the results of stochastic optimization derived from results on the convergence of optimal methods under the conditions of inexact gradients with small noises of nonrandom ...

Added: February 5, 2021

Linearly Converging Error Compensated SGD

Eduard Gorbunov, Kovalev D., Makarenko D. et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 20889–20900.

Added: December 7, 2020

Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping

Gorbunov E., Danilova M., Gasnikov A., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 15042–15053.

Added: December 7, 2020