?
Model-free Posterior Sampling via Learning Rate Randomization
P. 73719–73774.
Tiapkin D., Belomestny D., Calandriello D., Moulines E., Munos R., Naumov A., Perrault P., Valko M., Menard P.
Keywords: reinforcement learning
In book
Curran Associates, Inc., 2023.
Kychkin A., Chernitsin I., Прикладная информатика 2026 Т. 21 № 1 С. 40–58
The results of the development of a software microservice embedded in atmospheric air quality monitoring systems to support the identification of industrial pollution sources are presented. The emission and subsequent spread of harmful substances in the lower layers of the atmosphere is dynamic and characterized by high uncertainty due to the specific features of technological ...
Added: April 23, 2026
Cham: Springer, 2025.
This book constitutes the refereed proceedings of 34th International Workshops which were held in conjunction with the 34th International Conference on Artificial Neural Networks and Machine Learning, ICANN 2025, held in Kaunas, Lithuania, September 9–12, 2025.
The 20 full papers and 8 abstracts included in this workshop volume were carefully reviewed and selected from 42 submissions. ...
Added: September 29, 2025
Delev A., Semakov S., , in: 2025 8th International Conference on Artificial Intelligence and Big Data (ICAIBD).: IEEE, 2025. P. 318–322.
Profit is one of the most important economic indicators of a company’s performance, and for every company it is necessary to allocate resources in such a way as to obtain the maximum possible profit. The profit maximization problem is usually a dynamic optimization problem. This article discusses an approach to solving the production expansion problem ...
Added: August 25, 2025
Pastushkov A., Boulatov A., Finance Research Letters 2025 Vol. 83 Article 107671
Recent studies have increasingly explored whether reinforcement learning algorithms can give rise to cooperative behavior that results in non-competitive pricing across various market settings. In financial markets, Cartea et al. (2022) show that market makers using multi-armed bandit (MAB) algorithms generally converge to competitive pricing in quote-driven over-the-counter (OTC) markets, barring some unlikely exceptions where ...
Added: June 19, 2025
Rozhkov M., Alyamovskaya N., Zakhodiakin G., International Journal of Production Research 2025 Vol. 63 No. 18 P. 6630–6647
This article investigates the application of reinforcement learning (RL) methods to optimise a four-echelon linear supply chain model with stochastic demand. The proposed supply chain configuration is largely based on the production-distribution supply chain of the MIT Supply Chain Beer Game. We show that RL can significantly improve ordering efficiency and overall supply chain performance. ...
Added: March 24, 2025
Blokhin A., Kalev V., Pusev R. et al., , in: 2024 IEEE International Multi-Conference on Engineering, Computer and Information Sciences (SIBIRCON).: Novosibirsk: IEEE, 2024. P. 25–30.
Congestion control is one of the key mechanisms of communication in QUIC protocol which controls how much data and at which rate can be send to an endpoint at particular moment of time for better use of shared network resources and avoids moving into congestive collapse state. In this work we tackle the problem of ...
Added: December 18, 2024
Tiapkin D., Morozov N., Naumov A. et al., , in: Proceedings of The 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024), 2-4 May 2024, Palau de Congressos, Valencia, Spain. PMLR: Volume 238Vol. 238.: Valencia: PMLR, 2024. P. 4213–4221.
The recently proposed generative flow networks (GFlowNets) are a method of training a policy to sample compositional discrete objects with probabilities proportional to a given reward via a sequence of actions. GFlowNets exploit the sequential nature of the problem, drawing parallels with reinforcement learning (RL). Our work extends the connection between RL and GFlowNets to ...
Added: June 22, 2024
Yuri S. Popkov, Dubnov Y. A., Alexey Yu. Popkov, Mathematics 2023 Vol. 11 No. 17 Article 3651
This paper is devoted to problem-oriented reinforcement methods for the numerical implementation of Randomized Machine Learning. We have developed a scheme of the reinforcement procedure based on the agent approach and Bellman’s optimality principle. This procedure ensures strictly monotonic properties of a sequence of local records in the iterative computational procedure of the learning process. ...
Added: February 5, 2024
Tiapkin D., Belomestny D., Calandriello D. et al., , in: Proceedings of the 40th International Conference on Machine Learning: Volume 202: International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USAVol. 202: International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USA.: PMLR, 2023. P. 34161–34221.
Added: December 1, 2023
Tiapkin D., Belomestny D., Naumov A. et al., Working papers by Cornell University. Series math "arxiv.org" 2023 Article 2304.03056
In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize ...
Added: June 28, 2023
Belomestny D., Kaledin M., Golubev A., /. 2022.
Policy-gradient methods in Reinforcement Learning(RL) are very universal and widely applied in practice but their performance suffers from the high variance of the gradient estimate. Several procedures were proposed to reduce it including actor-critic(AC) and advantage actor-critic(A2C) methods. Recently the approaches have got new perspective due to the introduction of Deep RL: both new control ...
Added: April 14, 2023
Ponomarenko A. A., Economics: The Open-Access, Open-Assessment E-Journal 2020 Vol. 14 P. 1–15
The author set up a simplistic agent-based model where agents learn with reinforcement observing an incomplete set of variables. The model is employed to generate an artificial dataset that is used to estimate standard macro econometric models. The author shows that the results are qualitatively indistinguishable (in terms of the signs and significances of the ...
Added: March 28, 2023
Anastasia Grigoreva, Aleksei Gorin, Valeriy Klyuchnikov et al., Brain Stimulation 2023 Vol. 16 No. 1 P. 273
Transcranial electrical stimulation (TES) is a popular approach for studying and modulating cortical function. According to somatic doctrine, anodal TES increases, while cathodal reduces cortical excitability. Currently, numerous studies use TES in behavioral experiments with no physiological control, relying on the assumption of fairness and complete predictability of stimulation models. However, control reveals the actual ...
Added: March 1, 2023
Tiapkin D., Belomestny D., Calandriello D. et al., , in: Thirty-Sixth Conference on Neural Information Processing Systems : NeurIPS 2022.: Curran Associates, Inc., 2022. P. 10737–10751.
Added: February 3, 2023
Bobrov E., Kropotov Dmitry, Lu H. et al., IEEE Communications Letters 2022 Vol. 26 No. 4 P. 818–822
IEEEThe paper describes an online deep learning algorithm (ODL) for adaptive modulation and coding in massive MIMO. The algorithm is based on a fully connected neural network, which is initially trained on the output of the traditional algorithm and then incrementally retrained by the service feedback of its output. We show the advantage of our ...
Added: October 26, 2022
Tiapkin D., Alexander Gasnikov, , in: International Conference on Artificial Intelligence and Statistics, 28-30 March 2022, A Virtual ConferenceVol. 151: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.: PMLR, 2022. P. 9723–9740.
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze ...
Added: October 16, 2022
Tiapkin D., Belomestny D., Moulines E. et al., , in: Proceedings of the 39th International Conference on Machine LearningVol. 162.: PMLR, 2022. P. 21380–21431.
Added: July 11, 2022
С. М. Авдошин, Г. А. Арутюнов, Информационные технологии 2022 Т. 28 № 7 С. 378–391
The global pandemic has outlined the shortfall of human resources in the information technology sector. On the estimation of analysts, the labor shortage of IT-specialists in Russia in 2021 is between 500 thousand and 1 million people. Educating and bringing to market such numerous personnel may take years. The task of optimizing the process of ...
Added: June 11, 2022
Martinez-Saito M., Gorina E., Human Brain Mapping 2022 Vol. 43 No. 13 P. 4185–4206
Much of the uncertainty that clouds our understanding of the world springs from the covert values and intentions held by other people. Thus, it is plausible that specialized mechanisms that compute learning signals under uncertainty of exclusively social origin operate in the brain. To test this hypothesis, we scoured academic databases for neuroimaging studies involving ...
Added: May 27, 2022