### ?

## MARINA: Faster Non-Convex Distributed Learning with Compression

Ch. 139. P. 3788–3798.

Gorbunov E., Burlachenko K., Li Z., Richtarik P.

Language:
English

### In book

Vol. 139. , PMLR, 2021

Schechtman S., Tiapkin D., Muehlebach M. et al., , in : Proceedings of Machine Learning Research: Volume 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India. Vol. 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India.: PMLR, 2023. P. 1228–1258.

We consider the problem of minimizing a non-convex function over a smooth manifold M. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM), which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards M. ...

Added: December 1, 2023

Sadiev A., Borodich E., Beznosikov A. et al., EURO Journal on Computational Optimization 2022 Vol. 10 Article 100041

This paper considers the problem of decentralized, personalized federated learning. For centralized personalized federated learning, a penalty that measures the deviation from the local model and its average, is often added to the objective function. However, in a decentralized setting this penalty is expensive in terms of communication costs, so here, a different penalty — ...

Added: October 28, 2022

Eduard Gorbunov, Bibi A., Sener O. et al., , in : Proceedings of the 8th International Conference on Learning Representations (ICLR 2020). : ICLR, 2020. P. 1–28.

We consider the problem of unconstrained minimization of a smooth objective function in $\mathbb{R}^d$ in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP) Bergou et al. (2019). We show new complexity ...

Added: December 7, 2020

Tupitsa N., Dvurechensky P., Gasnikov A. et al., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 5 P. 721–739

We consider alternating minimization procedures for convex and non-convex optimization problems with the vector of variables divided into several blocks, each block being amenable for minimization with respect to its variables while maintaining other variables' blocks constant. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under the ...

Added: September 29, 2021

Ustyuzhanin A., Баранов А. С., Ratnikov F. et al., Journal of Instrumentation 2017 Vol. 12 No. 5 P. 1–8

The SHiP experiment is designed to search for very weakly interacting particles beyond the Standard Model which are produced in a 400 GeV/c proton beam dump at the CERN SPS. An essential task for the experiment is to keep the Standard Model background level to less than 0.1 event after 2× 1020 protons on target. In the beam ...

Added: February 26, 2018

Gorbunov E., Hanzely F., Richtarik P., , in : International Conference on Artificial Intelligence and Statistics, 13-15 April 2021, Virtual. Vol. 130.: PMLR, 2021. Ch. 130. P. 3556–3564.

Added: October 25, 2021

Beznosikov A., Gorbunov E., Gasnikov A., IFAC-PapersOnLine 2021 Vol. 53 No. 2 P. 4038–4043

In this paper, we propose a new method based on the Sliding Algorithm from Lan (2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. Our method uses the stochastic noised zeroth-order oracle for the non-smooth part and the first-order oracle for the smooth part and it is ...

Added: October 14, 2021

Eduard Gorbunov, Kovalev D., Makarenko D. et al., , in : Advances in Neural Information Processing Systems 33 (NeurIPS 2020). : Curran Associates, Inc., 2020. P. 20889–20900.

Added: December 7, 2020

Rogozin A., Lukoshkin V., Gasnikov A. et al., / Cornell University. Series arXiv "math". 2020.

We study the problem of decentralized optimization over time-varying networks with strongly convex smooth cost functions. In our approach, nodes run a multi-step gossip procedure after making each gradient update, thus ensuring approximate consensus at each iteration, while the outer loop is based on accelerated Nesterov scheme. The algorithm achieves precision ε>0 in O(sqrt{κ_g}χlog2(1/ε)) communication ...

Added: October 7, 2020

Guminov S., Nesterov Y., Dvurechensky P. et al., Doklady Mathematics 2019 Vol. 99 No. 2 P. 125–128

A new version of accelerated gradient descent is proposed. The method does not require any a priori information on the objective function, uses a linesearch procedure for convergence acceleration in practice, converge according to well-known lower bounds for both convex and nonconvex objective functions, and has primal-dual properties. A universal version of this method is ...

Added: October 31, 2020

Nesterov Y., Gasnikov A., Guminov S. et al., Optimization Methods and Software 2021 Vol. 36 No. 4 P. 773–810

In this paper, a new variant of accelerated gradient descent is proposed. The proposed method does not require any information about the objective function, uses exact line search for the practical accelerations of convergence, converges according to the well-known lower bounds for both convex and non-convex objective functions, possesses primal–dual properties and can be applied ...

Added: August 4, 2020

Bergou E. H., Eduard Gorbunov, Richtarik P., SIAM Journal on Optimization 2020 Vol. 30 No. 4 P. 2726–2749

In this paper we consider the unconstrained minimization problem of a smooth function in $\mathbb{R}^n$ in a setting where only function evaluations are possible. We design a novel randomized derivative-free algorithm---the stochastic three points (STP) method---and analyze its iteration complexity. At each iteration, STP generates a random search direction according to a certain fixed probability law. Our ...

Added: November 10, 2020

Bogolubsky L., Dvurechensky P., Gasnikov A. et al., , in : Advances in Neural Information Processing Systems 29 (NIPS 2016). : NY: Curran Associates, 2016. P. 4914–4922.

In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for features of nodes and edges. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an inexact oracle and unlike the state-of-the-art gradient-based method we manage to provide ...

Added: October 31, 2020

Kroshnin A., Tupitsa Nazarii, Dvinskikh D. et al., , in : Proceedings of Machine Learning Research. Vol. 97: International Conference on Machine Learning, 9-15 June 2019, Long Beach, California, USA.: PMLR, 2019. P. 3530–3540.

We study the complexity of approximating the Wasserstein barycenter of m discrete measures, or histograms of size n, by contrasting two alternative approaches that use entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to $m n^2 / \epsilon^2$ to approximate the original non-regularized barycenter. ...

Added: June 11, 2019

Pardalos P. M., Žilinskas A., Žilinskas J., Springer, 2017

Recent results on non-convex multi-objective optimization problems and methods are presented in this book, with particular attention to expensive black-box objective functions. Multi-objective optimization methods facilitate designers, engineers, and researchers to make decisions on appropriate trade-offs between various conflicting goals. A variety of deterministic and stochastic multi-objective optimization methods are developed in this book. Beginning ...

Added: September 29, 2017

Danilova M., Dvurechensky P., Gasnikov A. et al., / arXiv. Series arXiv:2012.06188 "arXiv:2012.06188". 2020.

Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex problems could not ...

Added: October 25, 2021

Ryabinin M., Gorbunov E., Plokhotnyuk V. et al., , in : Advances in Neural Information Processing Systems 34 (NeurIPS 2021). : Curran Associates, Inc., 2021. P. 18195–18211.

Added: February 1, 2022

Rogozin A., Uribe C., Gasnikov A. et al., IEEE Transactions on Control of Network Systems 2020 Vol. 7 No. 2 P. 829–841

We study optimal distributed first-order optimization algorithms when the network (i.e., communication constraints between the agents) changes with time. This problem is motivated by scenarios where agents experience network malfunctions. We provide a sufficient condition that guarantees a convergence rate with optimal (up to logarithmic terms) dependencies on the network and function parameters if the ...

Added: October 7, 2020