Self-Imitation Learning from Demonstrations

?

Self-Imitation Learning from Demonstrations

Ivanov D., Пшихачев Г. А., Егоров В. С., Shpilman A.

Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), solving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding agent’s exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, which are rarely optimal in realistic scenarios. Modern LfD algorithms lack robustness to suboptimal demonstrations and introduce additional hyperparameters to control the influence of demonstrations. To address these issues, we extend Self-Imitation Learning (SIL), a recent RL algorithm that exploits agent’s past good experience, to the LfD setup by initializing its replay buffer with demonstrations. We denote our algorithm as SIL from Demonstrations (SILfD). Our theoretical analysis highlights that SILfD is safe to apply to demonstrations of any degree of suboptimality and automatically adjusts the influence of demonstrations throughout the training. Our empirical investigation shows the superiority of SIL over existing LfD algorithms in settings of suboptimal demonstrations and sparse rewards.

Language: English

Keywords: deep reinforcement learning

In book

NeurIPS'2021 Deep Reinforcement Learning Workshop

[б.и.], 2021.

21st IEEE International Conference on Data Mining Workshops, ICDMW 2021

IEEE Computer Society, 2021.

The 21th IEEE International Conference on Data Mining (IEEE ICDM 2021) is a premier and truly international conference for researchers and practitioners in the broad area of data mining. The ICDM Workshops program (IEEE ICDMW) aims to provide a platform for multiple workshops with a range of more focused topics to be discussed and explored, where attendees can present ...

Added: February 4, 2022

Artificial Intelligence and Mathematical Models of Power Grids Driven by Renewable Energy Sources: A Survey

Srinivasan S., Kumarasamy S., Andreadakis Z. et al., Energies 2023 Vol. 16 No. 14 Article 5383

To face the impact of climate change in all dimensions of our society in the near future, the European Union (EU) has established an ambitious target. Until 2050, the share of renewable power shall increase up to 75% of all power injected into nowadays’ power grids. While being clean and having become significantly cheaper, renewable ...

Added: July 17, 2023

Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization

Грицаев Т. Г., Morozov N., Samsonov S. et al., / Series arXiv "math". 2024.

Generative Flow Networks (GFlowNets) are a family of generative models that learn to sample objects with probabilities proportional to a given reward function. The key concept behind GFlowNets is the use of two stochastic policies: a forward policy, which incrementally constructs compositional objects, and a backward policy, which sequentially deconstructs them. Recent results show a ...

Added: October 25, 2024

Maximum Entropy Model-based Reinforcement Learning

Svidchenko O., Shpilman A., , in: NeurIPS'2021 Deep Reinforcement Learning Workshop. [б.и.], 2021.

Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI ...

Added: March 24, 2022

When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding

Skrynnik A., Andreychuk A., Yakovlev K. et al., IEEE Transactions on Neural Networks and Learning Systems 2023 P. 1–14

Multi-agent pathfinding (MAPF) is a problem that involves finding a set of non-conflicting paths for a set of agents confined to a graph. In this work, we study a MAPF setting, where the environment is only partially observable for each agent, i.e., an agent observes the obstacles and other agents only within a limited field-of-view. ...

Added: December 4, 2023

Dealing With Sparse Rewards Using Graph Neural Networks

Gerasyov Matvey, Makarov I., IEEE Access 2023 Vol. 11 P. 89180–89187

Deep reinforcement learning in partially observable environments is a difficult task in itself and can be further complicated by a sparse reward signal. Most tasks involving navigation in three-dimensional environments provide the agent with minimal information. Typically, the agent receives a visual observation input from the environment and is rewarded once at the end of ...

Added: August 28, 2023

Improving GFlowNets with Monte Carlo Tree Search

Morozov N., Tiapkin D., Samsonov S. et al., , in: ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling. [б.и.], 2024.

Generative Flow Networks (GFlowNets) treat sampling from distributions over compositional discrete spaces as a sequential decision-making problem, training a stochastic policy to construct objects step by step. Recent studies have revealed strong connections between GFlowNets and entropy-regularized reinforcement learning. Building on these insights, we propose to enhance planning capabilities of GFlowNets by applying Monte Carlo ...

Added: October 24, 2024

NeurIPS'2021 Deep Reinforcement Learning Workshop

[б.и.], 2021.

Added: March 24, 2022