Strongly Convex Optimization for the Dual Formulation of Optimal Transport
Volume 80 is assigned to the 2018 International Conference on Machine Learning (ICML 2018)
We derive two convergence results for a sequential alternating maximization procedure to approximate the maximizer of random functionals such as the realized log likelihood in MLE estimation. We manage to show that the sequence attains the same deviation properties as shown for the profile M-estimator by Andresen and Spokoiny (2013), that means a finite sample Wilks and Fisher theorem. Further under slightly stronger smoothness constraints on the random functional we can show nearly linear convergence to the global maximizer if the starting point for the procedure is well chosen. ©2016 Andreas Andresen, and Vladimir Spokoiny.
В статье вводится класс локальных индикаторов, которые позволяют эффективно вычислять оптимальные транспортные планы, соответствующие произвольным распределениям точечных элементов спроса и предложения на вещественной прямой в случае, когда ценовая функция вогнута.
We study the complexity of approximating the Wasserstein barycenter of m discrete measures, or histograms of size n, by contrasting two alternative approaches that use entropic regularization. The first approach is based on the Iterative Bregman Projections (IBP) algorithm for which our novel analysis gives a complexity bound proportional to $m n^2 / \epsilon^2$ to approximate the original non-regularized barycenter. On the other hand, using an approach based on accelerated gradient descent, we obtain a complexity proportional to $m n^2 / \epsilon$. As a byproduct, we show that the regularization parameter in both approaches has to be proportional to $\epsilon$, which causes instability of both algorithms when the desired accuracy is high. To overcome this issue, we propose a novel proximal-IBP algorithm, which can be seen as a proximal gradient method, which uses IBP on each iteration to make a proximal step. We also consider the question of scalability of these algorithms using approaches from distributed optimization and show that the first algorithm can be implemented in a centralized distributed setting (master/slave), while the second one is amenable to a more general decentralized distributed setting with an arbitrary network topology.