• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • MARINA: Faster Non-Convex Distributed Learning with Compression
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.
May 15, 2026
‘What Matters Is Not What You Study, but Who You Study with
Katerina Koloskova began studying Arabic expecting to give it up after a year—now she cannot imagine her life without it. In an interview for the Young Scientists of HSE University project, she spoke about two translated books, an expedition to Socotra, and her love for Bethlehem.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

MARINA: Faster Non-Convex Distributed Learning with Compression

Ch. 139. P. 3788–3798.
Gorbunov E., Burlachenko K., Li Z., Richtarik P.
Language: English
Text on another site
Keywords: non-convex optimizationdistributed optimization

In book

Proceedings of the 38th International Conference on Machine Learning (ICML 2021)
Vol. 139. , PMLR, 2021.
Similar publications
Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization
Demidovich Y., Petr Ostroukhov, Malinovsky G. et al., , in: The Thirteenth International Conference on Learning Representations: ICLR 2025.: ICLR, 2025.
Non-convex Machine Learning problems typically do not adhere to the standard smoothness assumption. Based on empirical findings, Zhang et al. (2020b) proposed a more realistic generalized $(L_0,L_1)$-smoothness assumption, though it remains largely unexplored. Many existing algorithms designed for standard smooth problems need to be revised. However, in the context of Federated Learning, only a few ...
Added: July 15, 2025
Регуляризация и ускорение метода Гаусса–Ньютона
Yudin N., Gasnikov A., Компьютерные исследования и моделирование 2024 Т. 16 № 7 С. 1829–1840
We propose a family of Gauss–Newton methods for solving optimization problems and systems of nonlinear equations based on the ideas of using the upper estimate of the norm of the residual of the system of nonlinear equations and quadratic regularization. The paper presents a development of the «Three Squares Method» scheme with the addition of ...
Added: December 29, 2024
Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold
Schechtman S., Tiapkin D., Muehlebach M. et al., , in: Proceedings of Machine Learning Research: Volume 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, IndiaVol. 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India.: PMLR, 2023. P. 1228–1258.
We consider the problem of minimizing a non-convex function over a smooth manifold M. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM), which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards M. ...
Added: December 1, 2023
Decentralized personalized federated learning: Lower bounds and optimal algorithm for all personalization modes
Sadiev A., Borodich E., Beznosikov A. et al., EURO Journal on Computational Optimization 2022 Vol. 10 Article 100041
This paper considers the problem of decentralized, personalized federated learning. For centralized personalized federated learning, a penalty that measures the deviation from the local model and its average, is often added to the objective function. However, in a decentralized setting this penalty is expensive in terms of communication costs, so here, a different penalty — ...
Added: October 28, 2022
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Ryabinin M., Gorbunov E., Plokhotnyuk V. et al., , in: Advances in Neural Information Processing Systems 34 (NeurIPS 2021).: Curran Associates, Inc., 2021. P. 18195–18211.
Added: February 1, 2022
Recent Theoretical Advances in Non-Convex Optimization
Danilova M., Dvurechensky P., Gasnikov A. et al., / Series arXiv:2012.06188 "arXiv:2012.06188". 2020.
Motivated by recent increased interest in optimization algorithms for non-convex optimization in application to training deep neural networks and other optimization problems in data analysis, we give an overview of recent theoretical results on global performance guarantees of optimization algorithms for non-convex optimization. We start with classical arguments showing that general non-convex problems could not ...
Added: October 25, 2021
Local SGD: Unified Theory and New Efficient Methods
Gorbunov E., Hanzely F., Richtarik P., , in: International Conference on Artificial Intelligence and Statistics, 13-15 April 2021, VirtualVol. 130.: PMLR, 2021. Ch. 130 P. 3556–3564.
Added: October 25, 2021
Derivative-Free Method For Composite Optimization With Applications To Decentralized Distributed Optimization
Beznosikov A., Gorbunov E., Gasnikov A., IFAC-PapersOnLine 2021 Vol. 53 No. 2 P. 4038–4043
In this paper, we propose a new method based on the Sliding Algorithm from Lan (2016, 2019) for the convex composite optimization problem that includes two terms: smooth one and non-smooth one. Our method uses the stochastic noised zeroth-order oracle for the non-smooth part and the first-order oracle for the smooth part and it is ...
Added: October 14, 2021
Alternating minimization methods for strongly convex optimization
Tupitsa N., Dvurechensky P., Gasnikov A. et al., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 5 P. 721–739
We consider alternating minimization procedures for convex and non-convex optimization problems with the vector of variables divided into several blocks, each block being amenable for minimization with respect to its variables while maintaining other variables' blocks constant. In the case of two blocks, we prove a linear convergence rate for alternating minimization procedure under the ...
Added: September 29, 2021
Linearly Converging Error Compensated SGD
Eduard Gorbunov, Kovalev D., Makarenko D. et al., , in: Advances in Neural Information Processing Systems 33 (NeurIPS 2020).: Curran Associates, Inc., 2020. P. 20889–20900.
Added: December 7, 2020
A Stochastic Derivative Free Optimization Method with Momentum
Eduard Gorbunov, Bibi A., Sener O. et al., , in: Proceedings of the 8th International Conference on Learning Representations (ICLR 2020).: ICLR, 2020. P. 1–28.
We consider the problem of unconstrained minimization of a smooth objective function in $\mathbb{R}^d$ in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP) Bergou et al. (2019). We show new complexity ...
Added: December 7, 2020
Stochastic Three Points Method for Unconstrained Smooth Minimization
Bergou E. H., Eduard Gorbunov, Richtarik P., SIAM Journal on Optimization 2020 Vol. 30 No. 4 P. 2726–2749
In this paper we consider the unconstrained minimization problem of a smooth function in $\mathbb{R}^n$ in a setting where only function evaluations are possible. We design a novel randomized derivative-free algorithm---the stochastic three points (STP) method---and analyze its iteration complexity. At each iteration, STP generates a random search direction according to a certain fixed probability law. Our ...
Added: November 10, 2020
Accelerated primal-dual gradient descent with linesearch for convex, nonconvex, and nonsmooth optimization problems
Guminov S., Nesterov Y., Dvurechensky P. et al., Doklady Mathematics 2019 Vol. 99 No. 2 P. 125–128
A new version of accelerated gradient descent is proposed. The method does not require any a priori information on the objective function, uses a linesearch procedure for convergence acceleration in practice, converge according to well-known lower bounds for both convex and nonconvex objective functions, and has primal-dual properties. A universal version of this method is ...
Added: October 31, 2020
Learning supervised pagerank with gradient-based and gradient-free optimization methods
Bogolubsky L., Dvurechensky P., Gasnikov A. et al., , in: Advances in Neural Information Processing Systems 29 (NIPS 2016).: NY: Curran Associates, 2016. P. 4914–4922.
In this paper, we consider a non-convex loss-minimization problem of learning Supervised PageRank models, which can account for features of nodes and edges. We propose gradient-based and random gradient-free methods to solve this problem. Our algorithms are based on the concept of an inexact oracle and unlike the state-of-the-art gradient-based method we manage to provide ...
Added: October 31, 2020
Towards accelerated rates for distributed optimization over time-varying networks
Rogozin A., Lukoshkin V., Gasnikov A. et al., / Series arXiv "math". 2020.
We study the problem of decentralized optimization over time-varying networks with strongly convex smooth cost functions. In our approach, nodes run a multi-step gossip procedure after making each gradient update, thus ensuring approximate consensus at each iteration, while the outer loop is based on accelerated Nesterov scheme. The algorithm achieves precision ε>0 in O(sqrt{κ_g}χlog2(1/ε)) communication ...
Added: October 7, 2020
Optimal distributed convex optimization on slowly time-varying graphs
Rogozin A., Uribe C., Gasnikov A. et al., IEEE Transactions on Control of Network Systems 2020 Vol. 7 No. 2 P. 829–841
We study optimal distributed first-order optimization algorithms when the network (i.e., communication constraints between the agents) changes with time. This problem is motivated by scenarios where agents experience network malfunctions. We provide a sufficient condition that guarantees a convergence rate with optimal (up to logarithmic terms) dependencies on the network and function parameters if the ...
Added: October 7, 2020
Primal–dual accelerated gradient methods with small-dimensional relaxation oracle
Nesterov Y., Gasnikov A., Guminov S. et al., Optimization Methods and Software 2021 Vol. 36 No. 4 P. 773–810
In this paper, a new variant of accelerated gradient descent is proposed. The proposed method does not require any information about the objective function, uses exact line search for the practical accelerations of convergence, converges according to the well-known lower bounds for both convex and non-convex objective functions, possesses primal–dual properties and can be applied ...
Added: August 4, 2020
On the Complexity of Approximating Wasserstein Barycenters
Kroshnin A., Tupitsa Nazarii, Dvinskikh D. et al., , in: Proceedings of Machine Learning ResearchVol. 97: International Conference on Machine Learning, 9-15 June 2019, Long Beach, California, USA.: PMLR, 2019. P. 3530–3540.
We study the complexity of approximating the Wasserstein barycenter of m discrete measures, or histograms of size n, by contrasting two alternative approaches that use entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to $m n^2 / \epsilon^2$ to approximate the original non-regularized barycenter. ...
Added: June 11, 2019
The active muon shield in the SHiP experiment
Ustyuzhanin A., Баранов А. С., Ratnikov F. et al., Journal of Instrumentation 2017 Vol. 12 No. 5 P. 1–8
The SHiP experiment is designed to search for very weakly interacting particles beyond the Standard Model which are produced in a 400 GeV/c proton beam dump at the CERN SPS. An essential task for the experiment is to keep the Standard Model background level to less than 0.1 event after 2× 1020 protons on target. In the beam ...
Added: February 26, 2018
Non-Convex Multi-Objective Optimization
Pardalos P. M., Žilinskas A., Žilinskas J., Springer, 2017.
Recent results on non-convex multi-objective optimization problems and methods are presented in this book, with particular attention to expensive black-box objective functions. Multi-objective optimization methods facilitate designers, engineers, and researchers to make decisions on appropriate trade-offs between various conflicting goals. A variety of deterministic and stochastic multi-objective optimization methods are developed in this book. Beginning ...
Added: September 29, 2017
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit