?
Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient Clipping
P. 15042-15053.
In book
Curran Associates, Inc., 2020
Eduard Gorbunov, Hanzely F., Richtarik P., PMLR, 2020
In this paper we introduce a unified analysis of a large family of variants of proximal stochastic gradient descent (SGD) which so far have required different intuitions, convergence analyses, have different applications, and which have been developed separately in various communities. We show that our framework includes methods with and without the following tricks, and ...
Added: December 7, 2020
Gorbunov E., Danilova M., Shibaev I. et al., / arXiv. Series arXiv:2106.05958 "arXiv:2106.05958". 2021.
Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it ...
Added: October 25, 2021
Dvinskikh D., Gasnikov A., Journal of Inverse and Ill-posed problems 2021 Vol. 29 No. 3 P. 385-405
We introduce primal and dual stochastic gradient oracle methods for decentralized convex optimization problems. Both for primal and dual oracles, the proposed methods are optimal in terms of the number of communication steps. However, for all classes of the objective, the optimality in terms of the number of oracle calls per node takes place only ...
Added: October 29, 2021
Eduard Gorbunov, Kovalev D., Makarenko D. et al., , in : Advances in Neural Information Processing Systems 33 (NeurIPS 2020). : Curran Associates, Inc., 2020. P. 20889-20900.
Added: December 7, 2020
Kornilov N., Shamir O., Lobanov A. et al., , in : Advances in Neural Information Processing Systems 36 (NeurIPS 2023). : Curran Associates, Inc., 2023.
In this paper, we consider non-smooth stochastic convex optimization with two function evaluations per round under infinite noise variance. In the classical setting when noise has finite variance, an optimal algorithm, built upon the batched accelerated gradient method, was proposed in (Gasnikov et. al., 2022). This optimality is defined in terms of iteration and oracle ...
Added: March 26, 2024
Ivanova A., Gasnikov A., Dvurechensky P. et al., / Working papers by Cornell University. Series "Optimization and Control". 2020.
Ubiquitous in machine learning regularized empirical risk minimization problems are often composed of several blocks which can be treated using different types of oracles, e.g., full gradient, stochastic gradient or coordinate derivative. Optimal oracle complexity is known and achievable separately for the full gradient case, the stochastic gradient case, etc. We propose a generic framework ...
Added: October 25, 2020
Tiapkin D., Gasnikov A., Dvurechensky P., Optimization Letters 2022 Vol. 16 No. 7 P. 2145-2175
We consider the population Wasserstein barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. This leads to a complicated stochastic optimization problem where the objective is given as an expectation of a function given as a solution to a random optimization problem. We ...
Added: October 16, 2022
Decrouez G. G., Borovkov K., Theory Probability and its Applications 2012 Vol. 57 No. 3 P. 396-418
We consider a transformed Ornstein--Uhlenbeck process model that can be a good candidate for modeling real-life processes characterized by a combination of time-reverting behavior with heavy distribution tails. We begin with presenting the results of an exploratory statistical analysis of the log prices of a major Australian public company, demonstrating several key features typical of ...
Added: September 29, 2014
Ivanova A., Dvurechensky P., Vorontsova E. et al., Journal of Optimization Theory and Applications 2022 Vol. 193 No. 1-3 P. 462-490
Many convex optimization problems have structured objective functions written as a sum of functions with different oracle types (e.g., full gradient, coordinate derivative, stochastic gradient) and different arithmetic operations complexity of these oracles. In the strongly convex case, these functions also have different condition numbers that eventually define the iteration complexity of first-order methods and ...
Added: October 28, 2022
Moulines E., Pereyra M., Durmus A., SIAM Journal on Imaging Sciences 2018 Vol. 11 No. 1 P. 473-506
Modern imaging methods rely strongly on Bayesian inference techniques to solve challenging imaging problems. Currently, the predominant Bayesian computation approach is convex optimization, which scales very efficiently to high-dimensional image models and delivers accurate point estimation results. However, in order to perform more complex analyses, for example, image uncertainty quantification or model selection, it is ...
Added: December 11, 2018
Dvinskikh D., Tominin V., Tominin I. et al., , in : Mathematical Optimization Theory and Operations Research, 21st International Conference, MOTOR 2022, Petrozavodsk, Russia, July 2–6, 2022, Proceedings. Vol. 13367.: Springer, 2022. Ch. 279899. P. 18-33.
Added: October 28, 2022
Bogachev T., / Cornell University. Series math "arxiv.org". 2022.
In this research a continuous model for resource allocations in a queuing system is considered and a local prediction on the system behavior is developed. As a result we obtain a set of possible cases, some of which lead to quite clear optimization problems. Currently, the main result of this research direction is an algorithm ...
Added: October 21, 2022
Ivanova A., Пасечнюк Д., Dvurechensky P. et al., / Cornell University. Серия "Working papers by Cornell University". 2019.
In this paper, we consider the resource allocation problem in a network with a large number of connections which are used by a huge number of users. The resource allocation problem, which we consider is a maximization problem with linear inequality constraints. To solve this problem we construct the dual problem and propose to use ...
Added: October 23, 2020
Tiapkin D., Alexander Gasnikov, , in : International Conference on Artificial Intelligence and Statistics, 28-30 March 2022, A Virtual Conference. Vol. 151: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.: PMLR, 2022. P. 9723-9740.
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze ...
Added: October 16, 2022
Sadiev A., Beznosikov A., Dvurechensky P. et al., Communications in Computer and Information Science 2021 Vol. 1476 P. 71-85
Saddle-point problems have recently gained an increased attention from the machine learning community, mainly due to applications in training Generative Adversarial Networks using stochastic gradients. At the same time, in some applications only a zeroth-order oracle is available. In this paper, we propose several algorithms to solve stochastic smooth (strongly) convex-concave saddle-point problems using zeroth-order ...
Added: October 14, 2021
Kamzolov D., Dvurechensky P., Gasnikov A., Optimization Methods and Software 2021 Vol. 36 No. 6 P. 1289-1316
In this paper, we propose new first-order methods for minimization of a convex function on a simple convex set. We assume that the objective function is a composite function given as a sum of a simple convex function and a convex function with inexact Hölder-continuous subgradient. We propose Universal Intermediate Gradient Method. Our method enjoys ...
Added: August 4, 2020
Vorontsova E., Gasnikov A., Dvurechensky P. et al., Automation and Remote Control 2019 Vol. 80 No. 8 P. 1487-1501
We propose an accelerated gradient-free method with a non-Euclidean proximal operator associated with the p-norm (1 ⩽ p ⩽ 2). We obtain estimates for the rate of convergence of the method under low noise arising in the calculation of the function value. We present the results of computational experiments. ...
Added: December 10, 2019
Ingacheva A., Kokhan V., Osipov D., , in : Proceedings of the 32nd European Conference on Modelling and Simulation (ECMS 2018),Wilhelmshaven, Germany 22 – 25 May 2018. : NY : Curran Associates, Inc., 2018. P. 183-189.
In this paper we consider the task of inner objects mapping for the building with a bunch of moving around it autonomous agents which use narrow beam of radio waves using WiFi frequency (2.4 GHz). Linear model of pixel-wise radio waves attenuation is considered. SIRT algorithm with TV and Tikhonov regularizations is used for the ...
Added: April 6, 2019
Dvurechensky P., Dvinskikh D., Gasnikov A. et al., , in : Advances in Neural Information Processing Systems 31 (NeurIPS 2018). : Neural Information Processing Systems Foundation, 2018. P. 10760-10770.
We study the decentralized distributed computation of discrete approximations for the regularized Wasserstein barycenter of a finite set of continuous probability measures distributedly stored over a network. We assume there is a network of agents/machines/computers, and each agent holds a private continuous probability measure and seeks to compute the barycenter of all the measures in ...
Added: October 31, 2020
Dvinskikh D., Tyurin A., Gasnikov A. et al., Mathematical notes 2020 Vol. 108 No. 3-4 P. 511-522
A new method for deriving estimates of the rate of convergence of optimal methods for solving problems of smooth (strongly) convex stochastic optimization is described. The method is based on the results of stochastic optimization derived from results on the convergence of optimal methods under the conditions of inexact gradients with small noises of nonrandom ...
Added: February 5, 2021
Manita O. A., Veretennikov A., Moscow Mathematical Journal 2019 Vol. 19 No. 1 P. 89-106
Rate of convergence is studied for a diffusion process on the half line with a non-sticky reflection to a heavy-tailed 1D invariant distribution which density on the half line has a polynomial decay at infinity. Starting from a standard receipt which guarantees some polynomial convergence, it is shown how to construct a new non-degenerate diffusion ...
Added: November 15, 2018
Dvinskikh D., Gorbunov E., Gasnikov A. et al., , in : 2019 IEEE 58th Conference on Decision and Control (CDC). : IEEE, 2019. P. 7435-7440.
We introduce primal and dual stochastic gradient oracle methods for distributed convex optimization problems over networks. We show that the proposed methods are optimal (in terms of communication steps) for primal and dual oracles. Additionally, for a dual stochastic oracle, we propose a new analysis method for the rate of convergence in terms of duality ...
Added: February 5, 2021
Osokin A., Alayrac J., Lukasewitz I. et al., , in : Proceedings of Machine Learning Research. Proceedings of the International Conference on Machine Learning (ICML 2016). Vol. 48.: NY : [б.и.], 2016. P. 885-925.
In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by ...
Added: October 19, 2017
Garces A., Azhmyakov V., IFAC-PapersOnLine 2020 Vol. 53 No. 2 P. 13173-13177
This paper deals with an application of the nested convex programming to the optimal power flow (OPF) in multi-terminal high-voltage direct-current grids (MT-HVDC). The real-world optimization problem under consideration is non-convex. This fact implies some possible inconsistencies of the conventional numerical minimization algorithms (such as interior point method). Moreover, the constructive numerical treatment of this ...
Added: October 30, 2021