• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Book chapter
  • Maximum Entropy Model-based Reinforcement Learning
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
May 15, 2026
Preserving Rationality in a Period of Turbulence
The HSE International Laboratory for Logic, Linguistics and Formal Philosophy studies logic and rationality in a transformed world characterised by a diversity of logical systems and rational agents. The laboratory supports and develops academic ties with Russian and international partners. The HSE News Service spoke with the head of the laboratory, Prof. Elena Dragalina-Chernaya, about its work.
May 15, 2026
‘All My Time Is Devoted to My Dissertation
Ilya Venediktov graduated from the Master’s programme at the HSE Tikhonov Moscow Institute of Electronics and Mathematics through the combined Master’s–PhD track and is currently studying at the HSE Doctoral School of Engineering Sciences. At present, he is undertaking a long-term research internship at the University of Science and Technology of China in Hefei, where he is preparing his dissertation. In this interview, he explains how an internship differs from an academic mobility programme, discusses his research topic, and describes the daily life of a Russian doctoral student in China.
May 15, 2026
‘What Matters Is Not What You Study, but Who You Study with
Katerina Koloskova began studying Arabic expecting to give it up after a year—now she cannot imagine her life without it. In an interview for the Young Scientists of HSE University project, she spoke about two translated books, an expedition to Socotra, and her love for Bethlehem.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

Maximum Entropy Model-based Reinforcement Learning

.
Svidchenko O., Shpilman A.

Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks is currently limited due to most RL state-of-art algorithms' sample inefficiency, i.e., the need for a vast number of training episodes. For example, OpenAI Five algorithm that has beaten human players in Dota 2 has trained for thousands of years of game time. Several approaches exist that tackle the issue of sample inefficiency, that either offers a more efficient usage of already gathered experience or aim to gain a more relevant and diverse experience via a better exploration of an environment. However, to our knowledge, no such approach exists for model-based algorithms, that showed their high sample efficiency in solving hard control tasks with high-dimensional state space. This work connects exploration techniques and model-based reinforcement learning. We have designed a novel exploration method that takes into account features of the model-based approach. We also demonstrate through experiments that our method significantly improves the performance of the model-based algorithm Dreamer.

Language: English
Keywords: deep reinforcement learning

In book

NeurIPS'2021 Deep Reinforcement Learning Workshop
[б.и.], 2021.
Similar publications
Learning-Based UAV–RIS Secure Communication Under Eavesdropper Location Uncertainty
Suleiman E., Dayoub A., , in: Proceedings of the 2026 8th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE).: IEEE, 2026. Ch. 165 P. 1–6.
Unmanned aerial vehicle (UAV)-assisted reconfigurable intelligent surface (RIS) systems can enhance physical layer security through joint mobility and propagation control. However, most existing designs assume the availability of the eavesdropper's channel state information (CSI), which is unrealistic in passive eavesdropping scenarios. In this paper, secure UAV-RIS downlink communication is studied under bounded eavesdropper location uncertainty, ...
Added: April 30, 2026
Optical stabilization for laser communication satellite systems through proportional–integral–derivative (PID) control and reinforcement learning approach
Бахшалиев Р. М., Reutov A., Vorobey S. et al., Review of Scientific Instruments 2025 Vol. 96 No. 3
One of the main issues of the satellite-to-ground optical communication, including free-space satellite quantum key distribution (QKD), is an achievement of the reasonable accuracy of positioning, navigation, and optical stabilization. Proportional–integral–derivative (PID) controllers can handle various control tasks in optical systems. Recent research shows the promising results in the area of composite control systems including ...
Added: May 13, 2025
Optimization of the Accelerator Control by Reinforcement Learning: A Simulation-Based Approach
Ibrahim A., Derkach D., Petrenko A. et al., Physics of Particles and Nuclei 2025 Vol. 56 No. 6 P. 1476–1481
Optimizing accelerator control is a critical challenge in experimental particle physics, requiring significant manual effort and resource expenditure. Traditional tuning methods are often time-consuming and reliant on expert input, highlighting the need for more efficient approaches. This study aims to create a simulation-based framework integrated with Reinforcement Learning (RL) to address these challenges. Using \texttt{Elegant} ...
Added: March 16, 2025
Adaptive Algorithm for Selecting the Optimal Trading Strategy Based on Reinforcement Learning for Managing a Hedge Fund
Belyakov B., Sizykh D., IEEE Access 2024 Vol. 12 P. 189047–189063
In hedge fund management, the ability to dynamically select optimal trading strategies is paramount for maximizing returns and mitigating risk. This paper presents a pioneering approach that integrates Reinforcement Learning (RL), specifically the Proximal Policy Optimization (PPO) algorithm, into the strategy selection process for hedge fund management. Our model considers a diverse array of strategies, ...
Added: January 15, 2025
Improving GFlowNets with Monte Carlo Tree Search
Morozov N., Tiapkin D., Samsonov S. et al., , in: ICML 2024 Workshop on Structured Probabilistic Inference & Generative Modeling.: OpenReview, 2024.
Added: October 24, 2024
When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding
Skrynnik A., Andreychuk A., Yakovlev K. et al., IEEE Transactions on Neural Networks and Learning Systems 2024 Vol. 35 No. 12 P. 17411–17424
Multi-agent pathfinding (MAPF) is a problem that involves finding a set of non-conflicting paths for a set of agents confined to a graph. In this work, we study a MAPF setting, where the environment is only partially observable for each agent, i.e., an agent observes the obstacles and other agents only within a limited field-of-view. ...
Added: December 4, 2023
Dealing With Sparse Rewards Using Graph Neural Networks
Gerasyov Matvey, Makarov I., IEEE Access 2023 Vol. 11 P. 89180–89187
Deep reinforcement learning in partially observable environments is a difficult task in itself and can be further complicated by a sparse reward signal. Most tasks involving navigation in three-dimensional environments provide the agent with minimal information. Typically, the agent receives a visual observation input from the environment and is rewarded once at the end of ...
Added: August 28, 2023
Artificial Intelligence and Mathematical Models of Power Grids Driven by Renewable Energy Sources: A Survey
Srinivasan S., Kumarasamy S., Andreadakis Z. et al., Energies 2023 Vol. 16 No. 14 Article 5383
To face the impact of climate change in all dimensions of our society in the near future, the European Union (EU) has established an ambitious target. Until 2050, the share of renewable power shall increase up to 75% of all power injected into nowadays’ power grids. While being clean and having become significantly cheaper, renewable ...
Added: July 17, 2023
Self-Imitation Learning from Demonstrations
Ivanov D., Пшихачев Г. А., Егоров В. С. et al., , in: NeurIPS'2021 Deep Reinforcement Learning Workshop.: [б.и.], 2021.
Despite the numerous breakthroughs achieved with Reinforcement Learning (RL), solving environments with sparse rewards remains a challenging task that requires sophisticated exploration. Learning from Demonstrations (LfD) remedies this issue by guiding agent’s exploration towards states experienced by an expert. Naturally, the benefits of this approach hinge on the quality of demonstrations, which are rarely optimal ...
Added: March 24, 2022
NeurIPS'2021 Deep Reinforcement Learning Workshop
[б.и.], 2021.
Added: March 24, 2022
21st IEEE International Conference on Data Mining Workshops, ICDMW 2021
IEEE Computer Society, 2021.
The 21th IEEE International Conference on Data Mining (IEEE ICDM 2021) is a premier and truly international conference for researchers and practitioners in the broad area of data mining. The ICDM Workshops program (IEEE ICDMW) aims to provide a platform for multiple workshops with a range of more focused topics to be discussed and explored, where attendees can present ...
Added: February 4, 2022
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit