From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses
Vol. 162. , PMLR, 2022
, , , in : Proceedings of the 48th ISCIE International Symposium on Stochastic Systems Theory and its Applications. Vol. 2017.: Kyoto : The Institute of Systems, Control and Information Engineers , 2017. P. 190-196.
Within the paradigm of human intermittent control over unstable systems human behavior admits the interpretation as a sequence of point-like moments when the operator makes decision on activating or deactivating the control. These decision-making events are assumed to be governed by the information about the state of system under control which the operator accumulates continuously. ...
Added: November 5, 2021
, , et al., Frontiers in Human Neuroscience 2019 Vol. 13 No. 382 P. 1-12
Numerous cognitive studies have demonstrated experience-induced plasticity in the primary sensory cortex, indicating that repeated decisions could modulate sensory processing. In this context, we investigated whether an auditory version of the monetary incentive delay (MID) task could change the neural processing of the incentive cues that code expected monetary outcomes. To study sensory plasticity, we presented the incentive cues as ...
Added: October 23, 2019
, , , in : International Conference on Artificial Intelligence and Statistics, 28-30 March 2022, A Virtual Conference. Vol. 151: Proceedings of The 25th International Conference on Artificial Intelligence and Statistics.: PMLR, 2022. P. 9723-9740.
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs). For this purpose, some variant of Stochastic Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals. An important detail is the ability to use inexact values of functional constraints and compute the value of dual variables. We analyze ...
Added: October 16, 2022
, , Human Brain Mapping 2022 Vol. 43 No. 13 P. 4185-4206
Much of the uncertainty that clouds our understanding of the world springs from the covert values and intentions held by other people. Thus, it is plausible that specialized mechanisms that compute learning signals under uncertainty of exclusively social origin operate in the brain. To test this hypothesis, we scoured academic databases for neuroimaging studies involving ...
Added: May 27, 2022
, , Информационные технологии 2022 Т. 28 № 7 С. 378-391
The global pandemic has outlined the shortfall of human resources in the information technology sector. On the estimation of analysts, the labor shortage of IT-specialists in Russia in 2021 is between 500 thousand and 1 million people. Educating and bringing to market such numerous personnel may take years. The task of optimizing the process of ...
Added: June 11, 2022
, , et al., Social Cognitive and Affective Neuroscience 2022 Vol. 17 No. 9 P. 837-849
Why do people often exhaust unregulated common (shared) natural resources but manage to preserve similar private resources? To answer this question, in this study we combine a neurobiological, economic, and cognitive modeling approach. Using functional magnetic resonance imaging on 50 participants, we show that a sharp decrease of common and private resources is associated with ...
Added: February 8, 2022
, , et al., International Journal of Civil Engineering and Technology 2018 Vol. 9 No. 11 P. 220-226
Simulators of real-world IT systems are gaining popularity today. However, as it often happens in the early stages of technological readiness, the same term can be understood as different things - from visualisation systems to multi-level multi-agent models. The critical feature of the simulation technology is the degree of trust, or proximity of resemblance of ...
Added: November 14, 2019
, , eLife 2014 Vol. 2 No. 3
Efficient regulation of internal homeostasis and defending it against perturbations requires adaptive behavioral strategies. However, the computational principles mediating the interaction between homeostatic and associative learning processes remain undefined. Here we use a definition of primary rewards, as outcomes fulfilling physiological needs, to build a normative theory showing how learning motivated behaviors may be modulated ...
Added: December 19, 2014
Artificial General Intelligence. 12th International Conference, AGI 2019, Shenzhen, China, August 6–9, 2019, Proceedings
Added: October 30, 2020
, , et al., IEEE Communications Letters 2022 Vol. 26 No. 4 P. 818-822
IEEEThe paper describes an online deep learning algorithm (ODL) for adaptive modulation and coding in massive MIMO. The algorithm is based on a fully connected neural network, which is initially trained on the output of the traditional algorithm and then incrementally retrained by the service feedback of its output. We show the advantage of our ...
Added: October 26, 2022
, , , in : Brain Mapping: An Encyclopedic Reference. : San Diego : Academic Press, 2015.
Our decisions are affected not only by objective information about the available options but also by other people. Recent brain imaging studies have adopted the cognitive neuroscience approach for studying the neural mechanisms of social influence. A number of studies have shown that social influence is associated with neural activity in the medial prefrontal cortex ...
Added: October 22, 2014
, , , , in : 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). : IEEE, 2018. P. 553-557.
Robot navigation through crowds poses a difficult challenge to AI systems, since the methods should result in fast and efficient movement but at the same time are not allowed to compromise safety. Most approaches to date were focused on the combination of pathfinding algorithms with machine learning for pedestrian walking prediction. More recently, reinforcement learning ...
Added: January 18, 2019
, , et al., , in : Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2017, Marco Island, Florida, USA, May 22-24, 2017. AAAI Press 2017, ISBN 978-1-57735-787-2. : Palo Alto : AAAI Press, 2017. P. 412-415.
In this article a combination of two modern aspects of games development is considered: (i) the impact of high quality graphics and virtual reality (VR) user adaptation to believe in realness of in-game events by user’s own eyes; (ii) modeling an enemy’s behavior under automatic computer control, called BOT, which reacts similarly to human players. ...
Added: June 24, 2017
, , et al., Advances in the Astronautical Sciences 2020 Vol. 170 P. 305-319
The number of space objects will grow several times in a few years due to the planned launches of constellations of thousands microsatellites. It leads to a significant increase in the threat of satellite collisions. Spacecraft must undertake collision avoidance maneuvers to mitigate the risk. According to publicly available information, conjunction events are now manually ...
Added: October 10, 2019
, , et al., Social Cognitive and Affective Neuroscience 2013 Vol. 8 No. 7 P. 756-763
Humans often change their beliefs or behavior due to the behavior or opinions of others. This study explored, with the use of human event-related potentials (ERPs), whether social conformity is based on a general performance-monitoring mechanism. We tested the hypothesis that conflicts with a normative group opinion evoke a feedback-related negativity (FRN) often associated with ...
Added: June 6, 2013
, , , in : Biologically Inspired Cognitive Architectures (BICA) for Young Scientists. : Springer, 2017. P. 3-9.
At the moment reinforcement learning have advanced signifi- cantly with discovering new techniques and instruments for training. This paper is devoted to the application convolutional and recurrent neural networks in the task of planning with reinforcement learning problem. The aim of the work is to check whether the neural networks are fit for this problem. ...
Added: August 31, 2017
Scale-free memory model for multiagent reinforcement learning. Mean field approximation and rock-paper-scissors dynamics
, , The European Physical Journal B 2010 Vol. 76 No. 1 P. 69-85
A continuous time model for multiagent systems governed by reinforcement learning with scale-free memory is developed. The agents are assumed to act independently of one another in optimizing their choice of possible actions via trial-and-error search. To gain awareness about the action value the agents accumulate in their memory the rewards obtained from taking a ...
Added: November 6, 2021
, , et al., , in : Proceedings of the 24th ACM international conference on Multimedia (ACM MM'16), Amsterdam, Netherlands, 15-19 October 2016. : NY : Association for Computing Machinery (ACM), 2016. P. 735-736.
We present a multiplayer first-person shooter (FPS) game with advanced intelligent non-playable characters (NPC) under computer control. The game is specially adapted for playing in VR headset so the simulator sickness symptoms are significantly reduced. The demo allows users to play with the other human and NPC players in a shooter game made in Unreal Engine ...
Added: August 28, 2016
, , et al., Psychological Review 2017 Vol. 124 No. 2 P. 130-153
Drug addiction implicates both reward learning and homeostatic regulation mechanisms of the brain. This has stimulated 2 partially successful theoretical perspectives on addiction. Many important aspects of addiction, however, remain to be explained within a single, unified framework that integrates the 2 mechanisms. Building upon a recently developed homeostatic reinforcement learning theory, the authors focus on ...
Added: April 7, 2017
, , Advances in Complex Systems 2014 Vol. 17 No. 3-4 Article 1450013
Learning and adaptation play great role in emergent socio-economic phenomena. Complex dynamics has been previously found in the systems of multiple learning agents interacting via a simple game. Meanwhile, the single agent adaptation is considered trivially stable. We advocate the idea that adopting a more complex model of the individual behavior may result in a ...
Added: November 6, 2021
, , et al., , in : Proceedings of Machine Learning Research. Vol. 125: Proceedings of Thirty Third Conference on Learning Theory.: [б.и.], 2020. P. 2144-2203.
Added: July 30, 2020
Adaptive and Learning Agents Workshop at International Joint Conference on Autonomous Agents and Multiagent Systems
Adaptive and Learning Agents Workshop at International Joint Conference on Autonomous Agents and Multiagent Systems ...
Added: June 13, 2019
, , , Procedia Computer Science 2018 Vol. 123 P. 347-353
Single-shot grid-based path finding is an important problem with the applications in robotics, video games etc. Typically in AI community heuristic search methods (based on A And its variations) are used to solve it. In this work we present the results of preliminary studies on how neural networks can be utilized to path planning on ...
Added: September 3, 2018
, , et al., , in : Proceedings of the Third Workshop on Experimental Economics and Machine Learning (EEML 2016), Moscow, Russia, July 18, 2016. Vol. 1627.: Aachen : CEUR Workshop Proceedings, 2016. Ch. 3. P. 24-33.
We present two examples of how human-like behavior can be implemented in a model of computer player to improve its characteristics and decision-making patterns in video game. At first, we describe a reinforcement learning model, which helps to choose the best weapon depending on reward values obtained from shooting combat situations. Secondly, we consider an ...
Added: August 27, 2016