• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • A Stochastic Derivative Free Optimization Method with Momentum
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 18, 2026
The 'Second Shift' Is Not Why Women Avoid News
Women are more likely than men to avoid political and economic news, but the reasons for this behaviour are linked less to structural inequality or family-related stress than to personal attitudes and the emotional perception of news content. This conclusion was reached by HSE researchers after analysing data from a large-scale survey of more than 10,000 residents across 61 regions of Russia. The study findings have been published in Woman in Russian Society.
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

A Stochastic Derivative Free Optimization Method with Momentum

P. 1–28.
Eduard Gorbunov, Bibi A., Sener O., Bergou E. H., Richtarik P.

We consider the problem of unconstrained minimization of a smooth objective function in $\mathbb{R}^d$ in setting where only function evaluations are possible. We propose and analyze stochastic zeroth-order method with heavy ball momentum. In particular, we propose, SMTP, a momentum version of the stochastic three-point method (STP) Bergou et al. (2019). We show new complexity results for non-convex, convex and strongly convex functions. We test our method on a collection of learning to continuous control tasks on several MuJoCo Todorov et al. (2012) environments with varying difficulty and compare against STP, other state-of-the-art derivative-free optimization algorithms and against policy gradient methods. SMTP significantly outperforms STP and all other methods that we considered in our numerical experiments. Our second contribution is SMTP with importance sampling which we call SMTP_IS. We provide convergence analysis of this method for non-convex, convex and strongly convex objectives.

Language: English
Full text
Text on another site
Keywords: non-convex optimizationconvex optimizationderivative-free optimization

In book

Proceedings of the 8th International Conference on Learning Representations (ICLR 2020)
ICLR, 2020.
Similar publications
On Linear Convergence in Smooth Convex-Concave Bilinearly-Coupled Saddle-Point Optimization: Lower Bounds and Optimal Algorithms
Borodich E., Gasnikov A., Kovalev D., , in: Volume 267: International Conference on Machine Learning, 13-19 July 2025, Vancouver Convention Center, Vancouver, CanadaVol. 267.: [б.и.], 2025. P. 5045–5100.
Added: November 18, 2025
Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact
Kornilov N., Gasnikov A., Dvurechensky P. et al., Computational Management Science 2023 Article 37
We present two easy-to-implement gradient-free/zeroth-order methods to optimize a stochastic non-smooth function accessible only via a black-box. The methods are built upon efficient first-order methods in the heavy-tailed case, i.e., when the gradi- ent noise has infinite variance but bounded (1 + 𝜅)-th moment for some 𝜅 ∈ (0, 1]. The first algorithm is based ...
Added: February 7, 2025
Регуляризация и ускорение метода Гаусса–Ньютона
Yudin N., Gasnikov A., Компьютерные исследования и моделирование 2024 Т. 16 № 7 С. 1829–1840
We propose a family of Gauss–Newton methods for solving optimization problems and systems of nonlinear equations based on the ideas of using the upper estimate of the norm of the residual of the system of nonlinear equations and quadratic regularization. The paper presents a development of the «Three Squares Method» scheme with the addition of ...
Added: December 29, 2024
Solving Convex Min-Min Problems with Smoothness and Strong Convexity in One Group of Variables and Low Dimension in the Other
Gladin E., Alkousa M., Gasnikov A., Automation and Remote Control 2021 Vol. 82 P. 1679–1691
The article deals with some approaches to solving convex problems of the min-min type with smoothness and strong convexity in only one of the two groups of variables. It is shown that the proposed approaches based on Vaidya’s method, the fast gradient method, and the accelerated gradient method with variance reduction have linear convergence. It ...
Added: November 29, 2024
Vaidya’s method for convex stochastic optimization problems in small dimension
Gladin E., Gasnikov A., Ermakova E., Mathematical notes 2022 Vol. 112 No. 1 P. 183–190
The paper deals with a general problem of convex stochastic optimization in a space of small dimension (for example, 100 variables). It is known that for deterministic problems of convex optimization in small dimensions, the methods of centers of gravity type (for example, Vaidya’s method) provide the best convergence. For stochastic optimization problems, the question ...
Added: November 29, 2024
Accuracy Certificates for Convex Minimization with Inexact Oracle
Gladin E., Gasnikov A., Dvurechensky P., Journal of Optimization Theory and Applications 2025 Vol. 204 No. 1 Article 1
Accuracy certificates for convex minimization problems allow for online verification of the accuracy of approximate solutions and provide a theoretically valid online stopping criterion. When solving the Lagrange dual problem, accuracy certificates produce a simple way to recover an approximate primal solution and estimate its accuracy. In this paper, we generalize accuracy certificates for the ...
Added: November 29, 2024
Метод эллипсоидов для задач выпуклой стохастической оптимизации малой размерности
Gladin E., Зайнуллина К. Э., Компьютерные исследования и моделирование 2021 Т. 13 № 6 С. 1137–1147
The article considers minimization of the expectation of convex function. Problems of this type often arise in machine learning and a variety of other applications. In practice, stochastic gradient descent (SGD) and similar procedures are usually used to solve such problems. We propose to use the ellipsoid method with mini-batching, which converges linearly and can ...
Added: November 29, 2024
Обзор выпуклой оптимизации марковских процессов принятия решений
Rudenko V., Yudin N., Васин А. А., Компьютерные исследования и моделирование 2023 Т. 15 № 2 С. 329–353
This article reviews both historical achievements and modern results in the field of Markov Decision Process (MDP) and convex optimization. This review is the first attempt to cover the field of reinforcement learning in Russian in the context of convex optimization. The fundamental Bellman equation and the criteria of optimality of policy — strategies based on it, ...
Added: November 29, 2024
Breaking the Heavy-Tailed Noise Barrier in Stochastic Optimization Problems
Puchkin N., Gorbunov E., Kutuzov N. et al., , in: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), 2-4 May 2024, Palau de Congressos, Valencia, Spain. PMLR: Volume 238Vol. 238.: Valencia: PMLR, 2024. P. 856–864.
We consider stochastic optimization problems with heavy-tailed noise with structured density. For such problems, we show that it is possible to get faster rates of convergence than 𝑂(𝐾^{−2(𝛼−1)/𝛼}), when the stochastic gradients have finite 𝛼-th moment, 𝛼∈(1,2]. In particular, our analysis allows the noise norm to have an unbounded expectation. To achieve these results, we stabilize stochastic gradients, ...
Added: April 22, 2024
Accelerated zeroth-order method for non-smooth stochastic convex optimization problem with infinite variance
Kornilov N., Shamir O., Lobanov A. et al., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023).: Curran Associates, Inc., 2023. P. 64083–64102.
Added: March 26, 2024
Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold
Schechtman S., Tiapkin D., Muehlebach M. et al., , in: Proceedings of Machine Learning Research: Volume 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, IndiaVol. 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India.: PMLR, 2023. P. 1228–1258.
We consider the problem of minimizing a non-convex function over a smooth manifold M. We propose a novel algorithm, the Orthogonal Directions Constrained Gradient Method (ODCGM), which only requires computing a projection onto a vector space. ODCGM is infeasible but the iterates are constantly pulled towards the manifold, ensuring the convergence of ODCGM towards M. ...
Added: December 1, 2023
Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees
Beznosikov A., Richtarik P., Diskin M. et al., , in: Thirty-Sixth Conference on Neural Information Processing Systems : NeurIPS 2022.: Curran Associates, Inc., 2022. P. 14013–14029.
Added: January 27, 2023
On a Combination of Alternating Minimization and Nesterov’s Momentum
Guminov S., Dvurechensky P., Tupitsa N. et al., , in: Proceedings of the 38th International Conference on Machine Learning (ICML 2021)Vol. 139.: PMLR, 2021. P. 3886–3898.
Added: October 30, 2022
An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization
Gorbunov E., Dvurechensky P., Gasnikov A., SIAM Journal on Optimization 2022 Vol. 32 No. 2 P. 1210–1238
We consider an unconstrained problem of minimizing a smooth convex function which is only available through noisy observations of its values, the noise consisting of two parts. Similar to stochastic optimization problems, the first part is of stochastic nature. The second part is additive noise of unknown nature but bounded in absolute value. In the ...
Added: October 30, 2022
Oracle Complexity Separation in Convex Optimization
Ivanova A., Dvurechensky P., Vorontsova E. et al., Journal of Optimization Theory and Applications 2022 Vol. 193 No. 1-3 P. 462–490
Many convex optimization problems have structured objective functions written as a sum of functions with different oracle types (e.g., full gradient, coordinate derivative, stochastic gradient) and different arithmetic operations complexity of these oracles. In the strongly convex case, these functions also have different condition numbers that eventually define the iteration complexity of first-order methods and ...
Added: October 28, 2022
Application of the nested convex programming to the optimal power flow in MT-HVDC grids
Garces A., Azhmyakov V., IFAC-PapersOnLine 2020 Vol. 53 No. 2 P. 13173–13177
This paper deals with an application of the nested convex programming to the optimal power flow (OPF) in multi-terminal high-voltage direct-current grids (MT-HVDC). The real-world optimization problem under consideration is non-convex. This fact implies some possible inconsistencies of the conventional numerical minimization algorithms (such as interior point method). Moreover, the constructive numerical treatment of this ...
Added: October 30, 2021
Inexact model: a framework for optimization and variational inequalities
Stonyakin F., Tyurin A., Gasnikov A. et al., Optimization Methods and Software 2021 Vol. 36 No. 6 P. 1155–1201
In this paper, we propose a general algorithmic framework for the first-order methods in optimization in a broad sense, including minimization problems, saddle-point problems and variational inequalities (VIs). This framework allows obtaining many known methods as a special case, the list including accelerated gradient method, composite optimization methods, level-set methods, Bregman proximal methods. The idea ...
Added: October 29, 2021
Decentralized and parallel primal and dual accelerated methods for stochastic convex programming problems
Dvinskikh D., Gasnikov A., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 3 P. 385–405
We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only ...
Added: October 29, 2021
Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise
Gorbunov E., Danilova M., Shibaev I. et al., / Series arXiv:2106.05958 "arXiv:2106.05958". 2021.
Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it ...
Added: October 25, 2021
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit