?
Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise
P. 2144-2203.
Publication based on the results of:
In book
Vol. 125: Proceedings of Thirty Third Conference on Learning Theory. , [б.и.], 2020
Shpilman A., Никулин А. П., Proceedings of Machine Learning Research 2022 Vol. 176 P. 13-28
Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these ...
Added: October 10, 2023
Shpilman A., Kudenko D., Gaydashenko A., , in : 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). : IEEE, 2018. P. 553-557.
Robot navigation through crowds poses a difficult challenge to AI systems, since the methods should result in fast and efficient movement but at the same time are not allowed to compromise safety. Most approaches to date were focused on the combination of pathfinding algorithms with machine learning for pedestrian walking prediction. More recently, reinforcement learning ...
Added: January 18, 2019
Durmus A., Moulines E., Naumov A. et al., Mathematics of Operations Research 2024 Vol. - No. - Article -
This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a $d$-dimensional linear system $\bar{\mathbf{A}} \theta = \bar{\mathbf{b}}$, for which $(\bar{\mathbf{A}}, \bar{\mathbf{b}})$ can only be estimated through (asymptotically) unbiased observations $\{(\mathbf{A}(Z_n),\mathbf{b}(Z_n))\}_{n \in \mathbb{N}}$. ...
Added: July 13, 2022
Springer, 2019
Added: October 30, 2020
Martinez-Saito M., Gorina E., Human Brain Mapping 2022 Vol. 43 No. 13 P. 4185-4206
Much of the uncertainty that clouds our understanding of the world springs from the covert values and intentions held by other people. Thus, it is plausible that specialized mechanisms that compute learning signals under uncertainty of exclusively social origin operate in the brain. To test this hypothesis, we scoured academic databases for neuroimaging studies involving ...
Added: May 27, 2022
Shestakova A., Klucharev V., , in : Brain Mapping: An Encyclopedic Reference. : San Diego : Academic Press, 2015.
Our decisions are affected not only by objective information about the available options but also by other people. Recent brain imaging studies have adopted the cognitive neuroscience approach for studying the neural mechanisms of social influence. A number of studies have shown that social influence is associated with neural activity in the medial prefrontal cortex ...
Added: October 22, 2014
Karpov M., Arzymatov K., Belavin V. et al., International Journal of Civil Engineering and Technology 2018 Vol. 9 No. 11 P. 220-226
Simulators of real-world IT systems are gaining popularity today. However, as it often happens in the early stages of technological readiness, the same term can be understood as different things - from visualisation systems to multi-level multi-agent models. The critical feature of the simulation technology is the degree of trust, or proximity of resemblance of ...
Added: November 14, 2019
Tiapkin D., Belomestny D., Naumov A. et al., Working papers by Cornell University. Series math "arxiv.org" 2023 Article 2304.03056
In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize ...
Added: June 28, 2023
Keramati M., Durand A., Girardeau P. et al., Psychological Review 2017 Vol. 124 No. 2 P. 130-153
Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on ...
Added: April 7, 2017
Tiapkin D., Belomestny D., Calandriello D. et al., , in : Proceedings of the 40th International Conference on Machine Learning: Volume 202: International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USA. Vol. 202: International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USA.: PMLR, 2023. P. 34161-34221.
We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards. In this work, we study the maximum entropy exploration problem of two different types. The first type is visitation entropy maximization previously considered by Hazan et al. (2019) in the discounted setting. ...
Added: December 1, 2023
[б.и.], 2019
Adaptive and Learning Agents Workshop at International Joint Conference on Autonomous Agents and Multiagent Systems ...
Added: June 13, 2019
Bobrov E., Kropotov Dmitry, Lu H. et al., IEEE Communications Letters 2022 Vol. 26 No. 4 P. 818-822
IEEEThe paper describes an online deep learning algorithm (ODL) for adaptive modulation and coding in massive MIMO. The algorithm is based on a fully connected neural network, which is initially trained on the output of the traditional algorithm and then incrementally retrained by the service feedback of its output. We show the advantage of our ...
Added: October 26, 2022
Lubashevsky I., Hijikata K., , in : Proceedings of the 48th ISCIE International Symposium on Stochastic Systems Theory and its Applications. Vol. 2017.: Kyoto : The Institute of Systems, Control and Information Engineers , 2017. P. 190-196.
Within the paradigm of human intermittent control over unstable systems human behavior admits the interpretation as a sequence of point-like moments when the operator makes decision on activating or deactivating the control. These decision-making events are assumed to be governed by the information about the state of system under control which the operator accumulates continuously. ...
Added: November 5, 2021
Panov A. I., Yakovlev K., Suvorov R. E., Procedia Computer Science 2018 Vol. 123 P. 347-353
Single-shot grid-based path finding is an important problem with the applications in robotics, video games etc. Typically in AI community heuristic search methods (based on A And its variations) are used to solve it. In this work we present the results of preliminary studies on how neural networks can be utilized to path planning on ...
Added: September 3, 2018
Shestakova A., Rieskamp J., Tugin S. et al., Social Cognitive and Affective Neuroscience 2013 Vol. 8 No. 7 P. 756-763
Humans often change their beliefs or behavior due to the behavior or opinions of others. This study explored, with the use of human event-related potentials (ERPs), whether social conformity is based on a general performance-monitoring mechanism. We tested the hypothesis that conflicts with a normative group opinion evoke a feedback-related negativity (FRN) often associated with ...
Added: June 6, 2013
Ayunts E., Panov A. I., , in : Biologically Inspired Cognitive Architectures (BICA) for Young Scientists. : Springer, 2017. P. 3-9.
At the moment reinforcement learning have advanced signifi- cantly with discovering new techniques and instruments for training. This paper is devoted to the application convolutional and recurrent neural networks in the task of planning with reinforcement learning problem. The aim of the work is to check whether the neural networks are fit for this problem. ...
Added: August 31, 2017
Lubashevsky I., Kanemoto S., The European Physical Journal B 2010 Vol. 76 No. 1 P. 69-85
A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a ...
Added: November 6, 2021
Makarov I., Mikhail Tokmakov, Pavel Polyakov et al., , in : Proceedings of the 24th ACM international conference on Multimedia (ACM MM'16), Amsterdam, Netherlands, 15-19 October 2016. : NY : Association for Computing Machinery (ACM), 2016. P. 735-736.
We present a multiplayer first-person shooter (FPS) game with advanced intelligent non-playable characters (NPC) under computer control. The game is specially adapted for playing in VR headset so the simulator sickness symptoms are significantly reduced.
The demo allows users to play with the other human and NPC players in a shooter game made in Unreal Engine ...
Added: August 28, 2016
Keramati M., Гуткин Б. С., eLife 2014 Vol. 2 No. 3
Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated ...
Added: December 19, 2014
Tiapkin D., Alexander Gasnikov, , in : International Conference on Artificial Intelligence and Statistics, 28-30 March 2022, A Virtual Conference. Vol. 151: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.: PMLR, 2022. P. 9723-9740.
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze ...
Added: October 16, 2022
Малышева А. И., PEROLAT J., VYLDER B. D., American Association for the Advancement of Science 378.6623 2022 Vol. 378 No. 6623 P. 990-996
Stratego is a popular two-player imperfect information board game. Because of its complexity stemming from its enormous game tree, decision-making under imperfect information, and a piece deployment phase at the start, Stratego poses a challenge for artificial intelligence (AI). Previous computer programs only performed at an amateur level at best. Perolat et al. introduce a model-free ...
Added: June 17, 2023
Lubashevsky I., Zgonnikov A., Advances in Complex Systems 2014 Vol. 17 No. 3-4 Article 1450013
Learning and adaptation play great role in emergent socio-economic phenomena. Complex dynamics has been previously found in the systems of multiple learning agents interacting via a simple game. Meanwhile, the single agent adaptation is considered trivially stable. We advocate the idea that adopting a more complex model of the individual behavior may result in a ...
Added: November 6, 2021
Barkhagen M., Chau N. H., Moulines E. et al., Bernoulli: a journal of mathematical statistics and probability 2021 Vol. 27 No. 1 P. 1-33
We study the problem of sampling from a probability distribution ππ on RdRd which has a density w.r.t. the Lebesgue measure known up to a normalization factor x↦e−U(x)/∫Rde−U(y)dyx↦e−U(x)/∫Rde−U(y)dy. We analyze a sampling method based on the Euler discretization of the Langevin stochastic differential equations under the assumptions that the potential UU is continuously differentiable, ∇U∇U is Lipschitz, and UU is strongly concave. We focus on the ...
Added: December 9, 2021
Tiapkin D., Belomestny D., Calandriello D. et al., , in : Advances in Neural Information Processing Systems 36 (NeurIPS 2023). : Curran Associates, Inc., 2023. P. 73719-73774.
Added: February 17, 2024