MARINA: Faster Non-Convex Distributed Learning with Compression

E. Gorbunov; Burlachenko K.; Li Z.; Richtarik P.

Publications

?

MARINA: Faster Non-Convex Distributed Learning with Compression

Ch. 139. P. 3788–3798.

Gorbunov E., Burlachenko K., Li Z., Richtarik P.

Language: English

Text on another site

Keywords: non-convex optimization distributed optimization

In book

Proceedings of the 38th International Conference on Machine Learning (ICML 2021)

Vol. 139. , PMLR, 2021.

Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization

Demidovich Y., Petr Ostroukhov, Malinovsky G. et al., , in: The Thirteenth International Conference on Learning Representations: ICLR 2025.: ICLR, 2025.

Non-convex Machine Learning problems typically do not adhere to the standard smoothness assumption. Based on empirical findings, Zhang et al. (2020b) proposed a more realistic generalized $(L_0,L_1)$-smoothness assumption, though it remains largely unexplored. Many existing algorithms designed for standard smooth problems need to be revised. However, in the context of Federated Learning, only a few ...

Added: July 15, 2025

Регуляризация и ускорение метода Гаусса–Ньютона

Yudin N., Gasnikov A., Компьютерные исследования и моделирование 2024 Т. 16 № 7 С. 1829–1840

We propose a family of Gauss–Newton methods for solving optimization problems and systems of nonlinear equations based on the ideas of using the upper estimate of the norm of the residual of the system of nonlinear equations and quadratic regularization. The paper presents a development of the «Three Squares Method» scheme with the addition of ...

Added: December 29, 2024

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

Schechtman S., Tiapkin D., Muehlebach M. et al., , in: Proceedings of Machine Learning Research: Volume 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, IndiaVol. 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India.: PMLR, 2023. P. 1228–1258.

We consider the problem of minimizing a non-convex function over a smooth manifold M. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM), which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards M. ...

Added: December 1, 2023

Decentralized personalized federated learning: Lower bounds and optimal algorithm for all personalization modes

Sadiev A., Borodich E., Beznosikov A. et al., EURO Journal on Computational Optimization 2022 Vol. 10 Article 100041

This paper considers the problem of decentralized, personalized federated learning. For centralized personalized federated learning, a penalty that measures the deviation from the local model and its average, is often added to the objective function. However, in a decentralized setting this penalty is expensive in terms of communication costs, so here, a different penalty — ...

Added: October 28, 2022

Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices

Ryabinin M., Gorbunov E., Plokhotnyuk V. et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 18195–18211.

Added: February 1, 2022

Recent Theoretical Advances in Non-Convex Optimization

Danilova M., Dvurechensky P., Gasnikov A. et al., / Series arXiv:2012.06188 "arXiv:2012.06188". 2020.

Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex problems could not ...

Added: October 25, 2021

Local SGD: Unified Theory and New Efficient Methods

Gorbunov E., Hanzely F., Richtarik P., , in: International Conference on Artificial Intelligence and Statistics, 13-15 April 2021, VirtualVol. 130.: PMLR, 2021. Ch. 130 P. 3556–3564.

Added: October 25, 2021

Derivative-Free Method For Composite Optimization With Applications To Decentralized Distributed Optimization

Beznosikov A., Gorbunov E., Gasnikov A., IFAC-PapersOnLine 2021 Vol. 53 No. 2 P. 4038–4043

In this paper, we propose a new method based on the Sliding Algorithm from Lan (2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. Our method uses the stochastic noised zeroth-order oracle for the non-smooth part and the first-order oracle for the smooth part and it is ...

Added: October 14, 2021

Alternating minimization methods for strongly convex optimization

Tupitsa N., Dvurechensky P., Gasnikov A. et al., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 5 P. 721–739

We consider alternating minimization procedures for convex and non-convex optimization problems with the vector of variables divided into several blocks, each block being amenable for minimization with respect to its variables while maintaining other variables' blocks constant. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under the ...

Added: September 29, 2021

Linearly Converging Error Compensated SGD

Eduard Gorbunov, Kovalev D., Makarenko D. et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 20889–20900.

Added: December 7, 2020

A Stochastic Derivative Free Optimization Method with Momentum

Eduard Gorbunov, Bibi A., Sener O. et al., , in: Proceedings of the 8th International Conference on Learning Representations (ICLR 2020).: ICLR, 2020. P. 1–28.

We consider the problem of unconstrained minimization of a smooth objective function in $\mathbb{R}^d$ in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP) Bergou et al. (2019). We show new complexity ...

Added: December 7, 2020

Stochastic Three Points Method for Unconstrained Smooth Minimization

Bergou E. H., Eduard Gorbunov, Richtarik P., SIAM Journal on Optimization 2020 Vol. 30 No. 4 P. 2726–2749

In this paper we consider the unconstrained minimization problem of a smooth function in $\mathbb{R}^n$ in a setting where only function evaluations are possible. We design a novel randomized derivative-free algorithm---the stochastic three points (STP) method---and analyze its iteration complexity. At each iteration, STP generates a random search direction according to a certain fixed probability law. Our ...

Added: November 10, 2020

Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems

Guminov S., Nesterov Y., Dvurechensky P. et al., Doklady Mathematics 2019 Vol. 99 No. 2 P. 125–128

A new version of accelerated gradient descent is proposed. The method does not require any a priori information on the objective function, uses a linesearch procedure for convergence acceleration in practice, converge according to well-known lower bounds for both convex and nonconvex objective functions, and has primal-dual properties. A universal version of this method is ...

Added: October 31, 2020

Learning supervised pagerank with gradient-based and gradient-free optimization methods

Bogolubsky L., Dvurechensky P., Gasnikov A. et al., , in: Advances in Neural Information Processing Systems 29 (NIPS 2016).: NY: Curran Associates, 2016. P. 4914–4922.

In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for features of nodes and edges. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an inexact oracle and unlike the state-of-the-art gradient-based method we manage to provide ...

Added: October 31, 2020

Towards accelerated rates for distributed optimization over time-varying networks

Rogozin A., Lukoshkin V., Gasnikov A. et al., / Series arXiv "math". 2020.

We study the problem of decentralized optimization over time-varying networks with strongly convex smooth cost functions. In our approach, nodes run a multi-step gossip procedure after making each gradient update, thus ensuring approximate consensus at each iteration, while the outer loop is based on accelerated Nesterov scheme. The algorithm achieves precision ε>0 in O(sqrt{κ_g}χlog2(1/ε)) communication ...

Added: October 7, 2020

Optimal distributed convex optimization on slowly time-varying graphs

Rogozin A., Uribe C., Gasnikov A. et al., IEEE Transactions on Control of Network Systems 2020 Vol. 7 No. 2 P. 829–841

We study optimal distributed first-order optimization algorithms when the network (i.e., communication constraints between the agents) changes with time. This problem is motivated by scenarios where agents experience network malfunctions. We provide a sufficient condition that guarantees a convergence rate with optimal (up to logarithmic terms) dependencies on the network and function parameters if the ...

Added: October 7, 2020

Primal–dual accelerated gradient methods with small-dimensional relaxation oracle

Nesterov Y., Gasnikov A., Guminov S. et al., Optimization Methods and Software 2021 Vol. 36 No. 4 P. 773–810

In this paper, a new variant of accelerated gradient descent is proposed. The proposed method does not require any information about the objective function, uses exact line search for the practical accelerations of convergence, converges according to the well-known lower bounds for both convex and non-convex objective functions, possesses primal–dual properties and can be applied ...

Added: August 4, 2020

On the Complexity of Approximating Wasserstein Barycenters

Kroshnin A., Tupitsa Nazarii, Dvinskikh D. et al., , in: Proceedings of Machine Learning ResearchVol. 97: International Conference on Machine Learning, 9-15 June 2019, Long Beach, California, USA.: PMLR, 2019. P. 3530–3540.

We study the complexity of approximating the Wasserstein barycenter of m discrete measures, or histograms of size n, by contrasting two alternative approaches that use entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to $m n^2 / \epsilon^2$ to approximate the original non-regularized barycenter. ...

Added: June 11, 2019

The active muon shield in the SHiP experiment

Ustyuzhanin A., Баранов А. С., Ratnikov F. et al., Journal of Instrumentation 2017 Vol. 12 No. 5 P. 1–8

The SHiP experiment is designed to search for very weakly interacting particles beyond the Standard Model which are produced in a 400 GeV/c proton beam dump at the CERN SPS. An essential task for the experiment is to keep the Standard Model background level to less than 0.1 event after 2× 1020 protons on target. In the beam ...

Added: February 26, 2018

Non-Convex Multi-Objective Optimization

Pardalos P. M., Žilinskas A., Žilinskas J., Springer, 2017.

Recent results on non-convex multi-objective optimization problems and methods are presented in this book, with particular attention to expensive black-box objective functions. Multi-objective optimization methods facilitate designers, engineers, and researchers to make decisions on appropriate trade-offs between various conflicting goals. A variety of deterministic and stochastic multi-objective optimization methods are developed in this book. Beginning ...

Added: September 29, 2017