Maximum Entropy Model-based Reinforcement Learning

?

Maximum Entropy Model-based Reinforcement Learning

Svidchenko O., Shpilman A.

Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five algorithm that has beaten human players in Dota 2 has trained for thousands of years of game time. Several approaches exist that tackle the issue of sample inefficiency, that either offers a more efficient usage of already gathered experience or aim to gain a more relevant and diverse experience via a better exploration of an environment. However, to our knowledge, no such approach exists for model-based algorithms, that showed their high sample efficiency in solving hard control tasks with high-dimensional state space. This work connects exploration techniques and model-based reinforcement learning. We have designed a novel exploration method that takes into account features of the model-based approach. We also demonstrate through experiments that our method significantly improves the performance of the model-based algorithm Dreamer.

Language: English

Keywords: deep reinforcement learning

In book

NeurIPS'2021 Deep Reinforcement Learning Workshop

[б.и.], 2021.

Learning-Based UAV–RIS Secure Communication Under Eavesdropper Location Uncertainty

Ehab S. Suleiman, Ali J. Dayoub, , in: Proceedings of the 2026 8th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE).: IEEE, 2026. Ch. 165 P. 1–6.

Unmanned aerial vehicle (UAV)-assisted reconfigurable intelligent surface (RIS) systems can enhance physical layer security through joint mobility and propagation control. However, most existing designs assume the availability of the eavesdropper's channel state information (CSI), which is unrealistic in passive eavesdropping scenarios. In this paper, secure UAV-RIS downlink communication is studied under bounded eavesdropper location uncertainty, ...

Added: April 30, 2026

Optical stabilization for laser communication satellite systems through proportional–integral–derivative (PID) control and reinforcement learning approach

Бахшалиев Р. М., Reutov A., Vorobey S. et al., Review of Scientific Instruments 2025 Vol. 96 No. 3

One of the main issues of the satellite-to-ground optical communication, including free-space satellite quantum key distribution (QKD), is an achievement of the reasonable accuracy of positioning, navigation, and optical stabilization. Proportional–integral–derivative (PID) controllers can handle various control tasks in optical systems. Recent research shows the promising results in the area of composite control systems including ...

Added: May 13, 2025

Optimization of the Accelerator Control by Reinforcement Learning: A Simulation-Based Approach

Ibrahim A., Derkach D., Petrenko A. et al., Physics of Particles and Nuclei 2025 Vol. 56 No. 6 P. 1476–1481

Optimizing accelerator control is a critical challenge in experimental particle physics, requiring significant manual effort and resource expenditure. Traditional tuning methods are often time-consuming and reliant on expert input, highlighting the need for more efficient approaches. This study aims to create a simulation-based framework integrated with Reinforcement Learning (RL) to address these challenges. Using \texttt{Elegant} ...

Added: March 16, 2025

Adaptive Algorithm for Selecting the Optimal Trading Strategy Based on Reinforcement Learning for Managing a Hedge Fund

Belyakov B., Sizykh D., IEEE Access 2024 Vol. 12 P. 189047–189063

In hedge fund management, the ability to dynamically select optimal trading strategies is paramount for maximizing returns and mitigating risk. This paper presents a pioneering approach that integrates Reinforcement Learning (RL), specifically the Proximal Policy Optimization (PPO) algorithm, into the strategy selection process for hedge fund management. Our model considers a diverse array of strategies, ...

Added: January 15, 2025

Improving GFlowNets with Monte Carlo Tree Search

Morozov N., Tiapkin D., Samsonov S. et al., , in: ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling.: OpenReview, 2024.

Added: October 24, 2024

When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding

Skrynnik A., Andreychuk A., Yakovlev K. et al., IEEE Transactions on Neural Networks and Learning Systems 2024 Vol. 35 No. 12 P. 17411–17424

Multi-agent pathfinding (MAPF) is a problem that involves finding a set of non-conflicting paths for a set of agents confined to a graph. In this work, we study a MAPF setting, where the environment is only partially observable for each agent, i.e., an agent observes the obstacles and other agents only within a limited field-of-view. ...

Added: December 4, 2023

Dealing With Sparse Rewards Using Graph Neural Networks

Gerasyov Matvey, Makarov I., IEEE Access 2023 Vol. 11 P. 89180–89187

Deep reinforcement learning in partially observable environments is a difficult task in itself and can be further complicated by a sparse reward signal. Most tasks involving navigation in three-dimensional environments provide the agent with minimal information. Typically, the agent receives a visual observation input from the environment and is rewarded once at the end of ...

Added: August 28, 2023

Artificial Intelligence and Mathematical Models of Power Grids Driven by Renewable Energy Sources: A Survey

Srinivasan S., Kumarasamy S., Andreadakis Z. et al., Energies 2023 Vol. 16 No. 14 Article 5383

To face the impact of climate change in all dimensions of our society in the near future, the European Union (EU) has established an ambitious target. Until 2050, the share of renewable power shall increase up to 75% of all power injected into nowadays’ power grids. While being clean and having become significantly cheaper, renewable ...

Added: July 17, 2023

Self-Imitation Learning from Demonstrations

Ivanov D., Пшихачев Г. А., Егоров В. С. et al., , in: NeurIPS'2021 Deep Reinforcement Learning Workshop.: [б.и.], 2021.

Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), solving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding agent’s exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, which are rarely optimal ...

Added: March 24, 2022

NeurIPS'2021 Deep Reinforcement Learning Workshop

[б.и.], 2021.

Added: March 24, 2022

21st IEEE International Conference on Data Mining Workshops, ICDMW 2021

IEEE Computer Society, 2021.

The 21th IEEE International Conference on Data Mining (IEEE ICDM 2021) is a premier and truly international conference for researchers and practitioners in the broad area of data mining. The ICDM Workshops program (IEEE ICDMW) aims to provide a platform for multiple workshops with a range of more focused topics to be discussed and explored, where attendees can present ...

Added: February 4, 2022