From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses

Reinforcement learning competitions advance the field by providing appropriate scope and support to develop solutions toward a specific problem. To promote the development of more broadly applicable methods, organizers need to enforce the use of general techniques, the use of sample-efficient methods, and the reproducibility of the results. While beneficial for the research community, these ...

Added: October 10, 2023

Model-free Posterior Sampling via Learning Rate Randomization

Tiapkin D., Belomestny D., Calandriello D. et al., , in: Advances in Neural Information Processing Systems 36 (NeurIPS 2023). Curran Associates, Inc., 2023. P. 73719–73774.

Added: February 17, 2024

Homeostatic reinforcement learning for integrating reward collection and physiological stability.

Keramati M., Гуткин Б. С., eLife 2014 Vol. 2 No. 3

Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated ...

Added: December 19, 2014

Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise

Kaledin M., Moulines E., Naumov A. et al., , in: Proceedings of Machine Learning ResearchVol. 125: Proceedings of Thirty Third Conference on Learning Theory. [б.и.], 2020. P. 2144–2203.

Added: July 30, 2020

Statistical Properties of Decision-Making Governed by Reinforcement Learning with Status Quo Bias

Lubashevsky I., Hijikata K., , in: Proceedings of the 48th ISCIE International Symposium on Stochastic Systems Theory and its ApplicationsVol. 2017. Kyoto: The Institute of Systems, Control and Information Engineers , 2017. P. 190–196.

Within the paradigm of human intermittent control over unstable systems human behavior admits the interpretation as a sequence of point-like moments when the operator makes decision on activating or deactivating the control. These decision-making events are assumed to be governed by the information about the state of system under control which the operator accumulates continuously. ...

Added: November 5, 2021

A Comparative Evaluation of Machine Learning Methods for Robot Navigation Through Human Crowds

Shpilman A., Kudenko D., Gaydashenko A., , in: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2018. P. 553–557.

Robot navigation through crowds poses a difficult challenge to AI systems, since the methods should result in fast and efficient movement but at the same time are not allowed to compromise safety. Most approaches to date were focused on the combination of pathfinding algorithms with machine learning for pedestrian walking prediction. More recently, reinforcement learning ...

Added: January 18, 2019

Hybrid approach to design of storage attached network simulation systems

Karpov M., Arzymatov K., Belavin V. et al., International Journal of Civil Engineering and Technology 2018 Vol. 9 No. 11 P. 220–226

Simulators of real-world IT systems are gaining popularity today. However, as it often happens in the early stages of technological readiness, the same term can be understood as different things - from visualisation systems to multi-level multi-agent models. The critical feature of the simulation technology is the degree of trust, or proximity of resemblance of ...

Added: November 14, 2019

Sharp Deviations Bounds for Dirichlet Weighted Sums with Application to analysis of Bayesian algorithms

Tiapkin D., Belomestny D., Naumov A. et al., Working papers by Cornell University. Series math "arxiv.org" 2023 Article 2304.03056

In this work, we derive sharp non-asymptotic deviation bounds for weighted sums of Dirichlet random variables. These bounds are based on a novel integral representation of the density of a weighted Dirichlet sum. This representation allows us to obtain a Gaussian-like approximation for the sum distribution using geometry and complex analysis methods. Our results generalize ...

Added: June 28, 2023

Mastering the game of Stratego with model-free multiagent reinforcement learning

Малышева А. И., PEROLAT J., VYLDER B. D., American Association for the Advancement of Science 378.6623 2022 Vol. 378 No. 6623 P. 990–996

Stratego is a popular two-player imperfect information board game. Because of its complexity stemming from its enormous game tree, decision-making under imperfect information, and a piece deployment phase at the start, Stratego poses a challenge for artificial intelligence (AI). Previous computer programs only performed at an amateur level at best. Perolat et al. introduce a model-free ...

Added: June 17, 2023

Artificial General Intelligence. 12th International Conference, AGI 2019, Shenzhen, China, August 6–9, 2019, Proceedings

Springer, 2019.

Added: October 30, 2020

Massive MIMO Adaptive Modulation and Coding Using Online Deep Learning Algorithm

Bobrov E., Kropotov Dmitry, Lu H. et al., IEEE Communications Letters 2022 Vol. 26 No. 4 P. 818–822

IEEEThe paper describes an online deep learning algorithm (ODL) for adaptive modulation and coding in massive MIMO. The algorithm is based on a fully connected neural network, which is initially trained on the output of the traditional algorithm and then incrementally retrained by the service feedback of its output. We show the advantage of our ...

Added: October 26, 2022

Learning under social versus nonsocial uncertainty: A meta-analytic approach

Martinez-Saito M., Gorina E., Human Brain Mapping 2022 Vol. 43 No. 13 P. 4185–4206

Much of the uncertainty that clouds our understanding of the world springs from the covert values and intentions held by other people. Thus, it is plausible that specialized mechanisms that compute learning signals under uncertainty of exclusively social origin operate in the brain. To test this hypothesis, we scoured academic databases for neuroimaging studies involving ...

Added: May 27, 2022

Grid Path Planning with Deep Reinforcement Learning: Preliminary Results

Panov A. I., Yakovlev K., Suvorov R. E., Procedia Computer Science 2018 Vol. 123 P. 347–353

Single-shot grid-based path finding is an important problem with the applications in robotics, video games etc. Typically in AI community heuristic search methods (based on A And its variations) are used to solve it. In this work we present the results of preliminary studies on how neural networks can be utilized to path planning on ...

Added: September 3, 2018

Social Influence and Persuasion and Message Propagation

Shestakova A., Klucharev V., , in: Brain Mapping: An Encyclopedic Reference. San Diego: Academic Press, 2015.

Our decisions are affected not only by objective information about the available options but also by other people. Recent brain imaging studies have adopted the cognitive neuroscience approach for studying the neural mechanisms of social influence. A number of studies have shown that social influence is associated with neural activity in the medial prefrontal cortex ...

Added: October 22, 2014

Electrophysiological precursors of social conformity

Shestakova A., Rieskamp J., Tugin S. et al., Social Cognitive and Affective Neuroscience 2013 Vol. 8 No. 7 P. 756–763

Humans often change their beliefs or behavior due to the behavior or opinions of others. This study explored, with the use of human event-related potentials (ERPs), whether social conformity is based on a general performance-monitoring mechanism. We tested the hypothesis that conflicts with a normative group opinion evoke a feedback-related negativity (FRN) often associated with ...

Added: June 6, 2013

Task Planning in “Block World” with Deep Reinforcement Learning

Ayunts E., Panov A. I., , in: Biologically Inspired Cognitive Architectures (BICA) for Young Scientists. Springer, 2017. P. 3–9.

At the moment reinforcement learning have advanced signifi- cantly with discovering new techniques and instruments for training. This paper is devoted to the application convolutional and recurrent neural networks in the task of planning with reinforcement learning problem. The aim of the work is to check whether the neural networks are fit for this problem. ...

Added: August 31, 2017

Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics

Lubashevsky I., Kanemoto S., The European Physical Journal B 2010 Vol. 76 No. 1 P. 69–85

A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a ...

Added: November 6, 2021

First-Person Shooter Game for Virtual Reality Headset with Advanced Multi-Agent Intelligent System

Makarov I., Mikhail Tokmakov, Pavel Polyakov et al., , in: Proceedings of the 24th ACM international conference on Multimedia (ACM MM'16), Amsterdam, Netherlands, 15-19 October 2016. NY: Association for Computing Machinery (ACM), 2016. P. 735–736.

We present a multiplayer first-person shooter (FPS) game with advanced intelligent non-playable characters (NPC) under computer control. The game is specially adapted for playing in VR headset so the simulator sickness symptoms are significantly reduced. The demo allows users to play with the other human and NPC players in a shooter game made in Unreal Engine ...

Added: August 28, 2016

Primal-Dual Stochastic Mirror Descent for MDPs

Tiapkin D., Alexander Gasnikov, , in: International Conference on Artificial Intelligence and Statistics, 28-30 March 2022, A Virtual ConferenceVol. 151: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics. PMLR, 2022. P. 9723–9740.

We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze ...

Added: October 16, 2022

Cocaine addiction as a homeostatic reinforcement learning disorder

Keramati M., Durand A., Girardeau P. et al., Psychological Review 2017 Vol. 124 No. 2 P. 130–153

Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on ...

Added: April 7, 2017

Unstable Dynamics of Adaptation in Unknown Environment due to Novelty Seeking

Lubashevsky I., Zgonnikov A., Advances in Complex Systems 2014 Vol. 17 No. 3-4 Article 1450013

Learning and adaptation play great role in emergent socio-economic phenomena. Complex dynamics has been previously found in the systems of multiple learning agents interacting via a simple game. Meanwhile, the single agent adaptation is considered trivially stable. We advocate the idea that adopting a more complex model of the individual behavior may result in a ...

Added: November 6, 2021

Fast Rates for Maximum Entropy Exploration

Tiapkin D., Belomestny D., Calandriello D. et al., , in: Proceedings of the 40th International Conference on Machine Learning: Volume 202: International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USAVol. 202: International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USA. PMLR, 2023. P. 34161–34221.

Added: December 1, 2023

Adaptive and Learning Agents Workshop at International Joint Conference on Autonomous Agents and Multiagent Systems

[б.и.], 2019.

Adaptive and Learning Agents Workshop at International Joint Conference on Autonomous Agents and Multiagent Systems ...

Added: June 13, 2019

Обзор нейросетевых методов анализа и генерации кода

С. М. Авдошин, Г. А. Арутюнов, Информационные технологии 2022 Т. 28 № 7 С. 378–391

The global pandemic has outlined the shortfall of human resources in the information technology sector. On the estimation of analysts, the labor shortage of IT-specialists in Russia in 2021 is between 500 thousand and 1 million people. Educating and bringing to market such numerous personnel may take years. The task of optimizing the process of ...

Added: June 11, 2022