Primal-Dual Stochastic Mirror Descent for MDPs

D. Tiapkin; Alexander Gasnikov

Публикации

?

Primal-Dual Stochastic Mirror Descent for MDPs

P. 9723–9740.

Тяпкин Д. Н., Alexander Gasnikov

Язык: английский

Полный текст

Текст на другом сайте

Ключевые слова: reinforcement learning stochastic optimization

В книге

International Conference on Artificial Intelligence and Statistics, 28-30 March 2022, A Virtual Conference

Vol. 151: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. , PMLR, 2022.

Разработка микросервиса ADP для идентификации источников выбросов на основе машинного обучения с подкреплением

Кычкин А. В., Черницин И. А., Прикладная информатика 2026 № 1(121) С. 40–58

Представлены результаты разработки программного микросервиса, встраиваемого в системы мониторинга качества атмосферного воздуха для поддержки процессов идентификации промышленных источников загрязнений. Выброс и последующее распространение вредных веществ в приземистых слоях атмосферы происходит в динамике и характеризуется высокой неопределенностью из‑за особенностей технологических установок, их режимов работы, влияния рельефа местности, зданий и метеофакторов. Зависимости между местоположением источника выброса и ...

Добавлено: 23 апреля 2026 г.

Artificial Neural Networks and Machine Learning. ICANN 2025 International Workshops and Special Sessions: 34th International Conference on Artificial Neural Networks, Kaunas, Lithuania, September 9–12, 2025, Proceedings, Part V

Cham: Springer, 2025.

Добавлено: 29 сентября 2025 г.

Analysis of a Company Model in Conditions of Unstable Demand Using Reinforcement Learning Methods

Delev A., Semakov S., , in: 2025 8th International Conference on Artificial Intelligence and Big Data (ICAIBD).: IEEE, 2025. P. 318–322.

Добавлено: 25 августа 2025 г.

Pseudo-collusion in a centralized algorithmic financial market

Пастушков А. В., Булатов А. Э., Finance Research Letters 2025 Vol. 83 Article 107671

Добавлено: 19 июня 2025 г.

The beer game bullwhip effect mitigation: a deep reinforcement learning approach

Рожков М. И., Алямовская Н. С., Заходякин Г. В., International Journal of Production Research 2025 Vol. 63 No. 18 P. 6630–6647

Добавлено: 24 марта 2025 г.

Gradient-free methods for non-smooth convex stochastic optimization with heavy-tailed noise on convex compact

Kornilov N., Гасников А. В., Двуреченский П. Е. и др., Computational Management Science 2023 Article 37

Добавлено: 7 февраля 2025 г.

Deep Reinforcement Learning-Based Congestion Control for File Transfer over QUIC

Blokhin A., Kalev V., Пусев Р. С. и др., , in: 2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON).: Novosibirsk: IEEE, 2024. P. 25–30.

Добавлено: 18 декабря 2024 г.

Vaidya’s method for convex stochastic optimization problems in small dimension

Гладин Е. Л., Гасников А. В., Ermakova E., Mathematical notes 2022 Vol. 112 No. 1 P. 183–190

Добавлено: 29 ноября 2024 г.

Метод эллипсоидов для задач выпуклой стохастической оптимизации малой размерности

Гладин Е. Л., Зайнуллина К. Э., Компьютерные исследования и моделирование 2021 Т. 13 № 6 С. 1137–1147

В статье рассматривается задача минимизации математического ожидания выпуклой функции. Задачи такого вида повсеместны в машинном обучении, а также часто возникают в ряде других приложений. На практике для их решения обычно используются процедуры типа стохастического градиентного спуска (SGD). В нашей работе предлагается решать такие задачи с использованием метода эллипсоидов с мини-батчингом. Алгоритм имеет линейную скорость сходимости ...

Добавлено: 29 ноября 2024 г.

Generative Flow Networks as Entropy-Regularized RL

Тяпкин Д. Н., Морозов Н. В., Наумов А. А. и др., , in: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), 2-4 May 2024, Palau de Congressos, Valencia, Spain. PMLR: Volume 238Vol. 238.: Valencia: PMLR, 2024. P. 4213–4221.

Добавлено: 22 июня 2024 г.

Gradient-free Federated Learning Methods with l1 and l2-randomization for Non-smooth Convex Stochastic Optimization Problems

Alashqar B., Гасников А. В., Двинских Д. М. и др., Computational Mathematics and Mathematical Physics 2023 Vol. 63 P. 1600–1653

Добавлено: 27 марта 2024 г.

Accelerated zeroth-order method for non-smooth stochastic convex optimization problem with infinite variance

Kornilov N., Shamir O., Lobanov A. и др., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023).: Curran Associates, Inc., 2023. P. 64083–64102.

Добавлено: 26 марта 2024 г.

Model-free Posterior Sampling via Learning Rate Randomization

Тяпкин Д. Н., Беломестный Д. В., Calandriello D. и др., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023).: Curran Associates, Inc., 2023. P. 73719–73774.

Добавлено: 17 февраля 2024 г.

Reinforcement Procedure for Randomized Machine Learning

Yuri S. Popkov, Дубнов Ю. А., Alexey Yu. Popkov, Mathematics 2023 Vol. 11 No. 17 Article 3651

This paper is devoted to problem-oriented reinforcement methods for the numerical implementation of Randomized Machine Learning. We have developed a scheme of the reinforcement procedure based on the agent approach and Bellman’s optimality principle. This procedure ensures strictly monotonic properties of a sequence of local records in the iterative computational procedure of the learning process. ...

Добавлено: 5 февраля 2024 г.

Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold

Schechtman S., Тяпкин Д. Н., Muehlebach M. и др., , in: Proceedings of Machine Learning Research: Volume 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, IndiaVol. 195: The Thirty Sixth Annual Conference on Learning Theory, 12-15 July 2023, Bangalore, India.: PMLR, 2023. P. 1228–1258.

Добавлено: 1 декабря 2023 г.

Fast Rates for Maximum Entropy Exploration

Тяпкин Д. Н., Беломестный Д. В., Calandriello D. и др., , in: Proceedings of the 40th International Conference on Machine Learning: Volume 202: International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USAVol. 202: International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USA.: PMLR, 2023. P. 34161–34221.

Добавлено: 1 декабря 2023 г.

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

Тяпкин Д. Н., Беломестный Д. В., Наумов А. А. и др., Working papers by Cornell University. Series math "arxiv.org" 2023 Article 2304.03056

In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize ...

Добавлено: 28 июня 2023 г.