• A
  • A
  • A
  • АБВ
  • АБВ
  • АБВ
  • A
  • A
  • A
  • A
  • A
Обычная версия сайта
  • RU
  • EN
  • HSE University
  • Publications
  • Articles
  • TreeDQN: Sample-efficient off-policy reinforcement learning for combinatorial optimization
  • RU
  • EN
Расширенный поиск
Высшая школа экономики
Национальный исследовательский университет
Priority areas
  • business informatics
  • economics
  • engineering science
  • humanitarian
  • IT and mathematics
  • law
  • management
  • mathematics
  • sociology
  • state and public administration
by year
  • 2027
  • 2026
  • 2025
  • 2024
  • 2023
  • 2022
  • 2021
  • 2020
  • 2019
  • 2018
  • 2017
  • 2016
  • 2015
  • 2014
  • 2013
  • 2012
  • 2011
  • 2010
  • 2009
  • 2008
  • 2007
  • 2006
  • 2005
  • 2004
  • 2003
  • 2002
  • 2001
  • 2000
  • 1999
  • 1998
  • 1997
  • 1996
  • 1995
  • 1994
  • 1993
  • 1992
  • 1991
  • 1990
  • 1989
  • 1988
  • 1987
  • 1986
  • 1985
  • 1984
  • 1983
  • 1982
  • 1981
  • 1980
  • 1979
  • 1978
  • 1977
  • 1976
  • 1975
  • 1974
  • 1973
  • 1972
  • 1971
  • 1970
  • 1969
  • 1968
  • 1967
  • 1966
  • 1965
  • 1964
  • 1963
  • 1958
  • More
Subject
News
June 11, 2026
Doctoral Student at HSE University Reveals Hidden Layout of Ancient Parion
İdil Malgil, a researcher at HSE University, conducted a UAV-based LiDAR survey of the ancient Roman city of Parion in present-day Turkey. The high density of the scans allowed the team to detect subtle terrain features concealed beneath the ground and vegetation. The survey revealed traces of entire neighbourhoods, terraced structures, and walls that had remained invisible during routine excavations and could not be identified through aerial photography. The findings have been published in Ancient Civilizations from Scythia to Siberia.
June 11, 2026
Mathematicians from Nizhny Novgorod and Shanghai Study System Stability
Mathematicians at HSE University–Nizhny Novgorod, in collaboration with colleagues from Tongji University in Shanghai, are investigating the fundamental causes of structural stability in systems and the mechanisms underlying its disruption. In this interview with the HSE News Service, Prof. Olga Pochinka, Head of the International Laboratory of Dynamical Systems and Applications at HSE University–Nizhny Novgorod and leader of the project ‘Qualitative Theory of Systems of Ordinary and Partial Differential Equations,’ discusses the project, which is being implemented as part of HSE University's International Academic Cooperation programme.
June 11, 2026
Neurolinguists Assist in Awake Surgery on 11-Year-Old Patient with Epilepsy
Researchers at the HSE Centre for Language and Brain took part in a rare awake neurosurgical procedure performed on an 11-year-old patient with drug-resistant epilepsy. Working alongside surgeons at the Voyno-Yasenetsky Centre of Specialised Medical Care for Children in Solntsevo, they monitored the resection of a portion of the left temporal lobe, where the epileptic focus had been identified.

 

Have you spotted a typo?
Highlight it, click Ctrl+Enter and send us a message. Thank you for your help!

Publications
  • Books
  • Articles
  • Chapters of books
  • Working papers
  • Report a publication
  • Research at HSE

?

TreeDQN: Sample-efficient off-policy reinforcement learning for combinatorial optimization

Knowledge-Based Systems. 2026. Vol. 348. Article 116258.
Sorokin D., Kostin A., Savchenko L., Gusev G., Savchenko A.

A convenient approach to optimally solving combinatorial optimization tasks is the Branch-and-Bound method.
Its branching heuristic can be learned to solve a large set of similar tasks. The promising results here are
achieved by the recently appeared on-policy reinforcement learning method based on the tree Markov Decision
Process. To overcome its main disadvantages, namely, very large training time and unstable training, we
propose TreeDQN (Tree Deep Q-Network), a sample-efficient off-policy RL method trained by optimizing the
geometric mean of expected return. To theoretically support the training procedure for our method, we prove
the contraction property of the Bellman operator for the tree MDP. As a result, our method requires up to
10 times less training data and performs faster than known on-policy methods on synthetic tasks. Moreover,
TreeDQN significantly outperforms the state-of-the-art techniques on a challenging practical task from the
ML4CO competition.

Research target: Computer Science
Language: English
Full text
DOI
Text on another site
Keywords: Марковский процессобучение с подкреплениемMixed integer linear programsReinforcement learningTree Markov Decision ProcessBellman operator’s contractionсмешанные целочисленные линейные программы
Similar publications
Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)
Strube M., Braud C., Hardmeier C. et al., Suzhou: Association for Computational Linguistics, 2025.
Added: June 11, 2026
Microbial diversity and production of milk spirit using traditional Buryat fermentation and distillation technologies
Namsaraev Z., Nanzatov B., Kozlova A. et al., Scientific Reports 2026 Vol. 16 No. 1 Article 17769
Distilled fermented milk beverages are rare in food technology, despite the global prevalence of plant-based spirits. Currently, the production of distilled strong alcoholic beverages from fermented milk using traditional technologies is known only among Mongolic-speaking peoples and their Siberian neighbors. This study provides the first interdisciplinary analysis of darasun, a traditional Buryat spirit made from fermented ...
Added: June 10, 2026
Artificial intelligence and digital twins for failure prediction in data center cooling systems: a comprehensive literature review (2018–2026)
Butorova A., Bobakov V., Sergeev A. et al., European Physical Journal: Special Topics 2026 P. 1–19
This paper presents a review of artificial intelligence (AI) methods for failure prediction in data center cooling systems, with a focus on the integration of digital twins (DTs), physics-informed learning, and graph-based models. Positioned within complex network science, this review addresses a limitation of conventional graph approaches—their reliance on pairwise connectivity—whereas real-world failures often arise ...
Added: June 10, 2026
Innovations in Information and Decision Sciences. Proceedings of the 13th International Conference on Frontiers in Intelligent Computing: Theory and Applications (FICTA 2025), Volume 4
Springer, 2026.
The book presents the proceedings of the 13th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2024), held at Intelligent Systems Research Group (ISRG), London Metropolitan University, London, United Kingdom, during June 6–7, 2025. Researchers, scientists, engineers and practitioners exchange new ideas and experiences in the domain of intelligent computing theories with ...
Added: June 8, 2026
Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
Seul: PMLR, 2026.
Added: June 4, 2026
OpenAtom Foundation. Консорциум, развивающий Open Source в Китае.
Silakov D., Системный администратор 2026 № 3 С. 28–33
В статье про платформы для разработки открытого ПО в Китае мы рассказали про GitCode – молодой проект, позиционируемый как площадка для разработчиков со всего мира. Сейчас на GitCode размещаются проекты, созданные в КНР, но некоторые из них уже известны и на международной арене. Помочь открытым проектам в становлении, развитии и расширению аудитории призван фонд OpenAtom ...
Added: June 2, 2026
The recognition-by-components method
Slivnitsin P., Mylnikov L., Engineering Applications of Artificial Intelligence 2026 Vol. 179 Article 115185
The paper describes a applied artificial intelligence task of recognition-by-components method of real objects based on the recognition of a limited set of primitives or components. The recognition-by-components makes it possible to determine the components, that compose an object, and increase the number of recognizable objects without degrading the recognition quality. Training is performed on ...
Added: May 29, 2026
Brain-Computer Interfaces for Gait Rehabilitation After Stroke A Scoping Review
Mokienko O., Zisman M. A., Bobrov P. et al., American Journal of Physical Medicine and Rehabilitation 2026 Vol. 105 No. 6 P. 555–563
Brain-computer interfaces (BCIs) represent a promising technology for restoring lower limb motor functions and gait after stroke. The application of BCIs in this field is supported by a limited number of studies. The objective of the review was to systematically and critically evaluate the current evidence on the use of BCIs for lower limb function ...
Added: May 28, 2026
Generalizing the Brady-Yong Algorithm: Efficient Fast Hough Transform for Arbitrary Image Sizes
Kazimirov D., Rybakova E., Vitalii V. Gulevskii et al., IEEE Access 2025 Vol. 13 P. 20101–20132
The Hough (discrete Radon) transform (HT/DRT) is a digital image processing tool that has become indispensable in many application areas, ranging from general image processing to neural networks and X-ray computed tomography. The utilization of the HT in applied problems demands its computational efficiency and increased accuracy. The de facto standard algorithm for the fast ...
Added: May 28, 2026
Universal Comparison Methodology for Hough Transform Approaches
Kazimirov D., Vitalii Gulevskii, Kroshnin A. et al., Mathematics 2026 Article 1136
The Hough transform (HT) is widely used in computer vision, tomography, and neural networks. Numerous algorithms for HT computation have been proposed, making their systematic comparison essential. However, existing comparative methodologies are either non-universal and limited to certain HT formulations, or task-oriented, relying on application-specific criteria that do not fully capture algorithmic properties. This paper ...
Added: May 28, 2026
ИНФОРМАЦИОННЫЕ ТЕХНОЛОГИИ И ТЕХНИЧЕСКИЕ СРЕДСТВА УПРАВЛЕНИЯ (ICCT-2024)
М.: Институт проблем управления им. В.А. Трапезникова РАН, 2024.
В сборник вошли материалы VIII Международной научной конференции «Информационные технологии и технические средства управления» (ICCT-2024). На конференции были рассмотрены вопросы, касающиеся перспектив развития научного приборостроения в телекоммуникационных и управляющих системах, биомедицинской информатики, аппаратного и программного обеспечения информационнокоммуникационных систем, надежности, диагностики и неразрушающего контроля, систем управления и автоматизации, цифровых экосистем, управления производством и логистикой, методов математического ...
Added: May 27, 2026
Non-linear in-band interference cancellation on base of conjugate gradients method
Degtyarev A., Bakhurin S., Yudin N., DSPA 2026 P. 1–6
This paper investigates one possible solution to the problem of self-interference cancellation (SIC) arising in the design of in-band full-duplex (IBFD) communication systems. Self-interference cancellation is performed in the digital domain using multilayer nonlinear models adapted via gradient-based optimization. The presence of local minima and saddle points during the adaptation of multilayer models limits the ...
Added: May 26, 2026
28th European Conference on Artificial Intelligence, 25-30 October 2025, Bologna, Italy – Including 14th Conference on Prestigious Applications of Intelligent Systems (PAIS 2025)
IOS Press, 2025.
Added: May 26, 2026
Comparative Study of Training Methods and Architectures of Echo State Networks
Androsov I., Proceedings of the Institute for System Programming of the RAS 2026 Vol. 38 No. 3 P. 87–114
This paper examines echo state networks (ESNs), one of the most prevalent approaches to implementing reservoir computing. An ESN consists of a recurrent neural network with fixed (untrained) weights and a readout layer that is typically linear and trainable. This approach enables the creation of energyefficient and computationally efficient neural networks capable of real-time learning. However, since ...
Added: May 26, 2026
UVIP: Model-Free Approach to Evaluate Reinforcement Learning Algorithms
Belomestny D., Levin I., Naumov A. et al., Journal of Optimization Theory and Applications 2026 Vol. 208 Article 89
Policy evaluation is an important instrument for the comparison of different algorithms in Reinforcement Learning (RL). However, even a precise knowledge of the value function Vπ corresponding to a policy π does not provide reliable information on how far the policy π is from the optimal one. We present a novel model-free upper value iteration ...
Added: February 10, 2026
Impact of self-learning based high-frequency traders on the stock market
Mansurov K., Semenov A., Dmitry Grigoriev et al., Expert Systems with Applications 2023 Vol. 232 Article 120567
In this paper we investigate the role of self-learning agents in multi-agent models of financial markets. We develop an agent-based simulation model of a financial market and, in addition to the agents with fixed strategies used in previous research, we introduce an agent with a self-learning strategy. To model the behavior of such an agent, ...
Added: July 11, 2025
Cryptocurrency Exchange Simulation
Mansurov K., Semenov A., Dmitry Grigoriev et al., Computational Economics 2024 Vol. 64 P. 2585–2603
In this paper, we consider the approach of applying state-of-the-art machine learning algorithms to simulate some financial markets. In this case, we choose the cryptocurrency market based on the assumption that such markets more active today. As a rule, they have more volatility, attracting riskier traders. Considering classic trading strategies, we also introduce an agent with a ...
Added: July 11, 2025
Optimal Approximation of Average Reward Markov Decision Processes
Sapronov Y., Yudin N., Computational Mathematics and Mathematical Physics 2025 Vol. 65 No. 3 P. 567–581
We continue to develop the concept of studying the ε-optimal policy for Average Reward Markov Decision Processes (AMDP) by reducing it to Discounted Markov Decision Processes (DMDP). Existing research often stipulates that the discount factor must not fall below a certain threshold. Typically, this threshold is close to one, and as is well-known, iterative methods ...
Added: June 10, 2025
The beer game bullwhip effect mitigation: a deep reinforcement learning approach
Rozhkov M., Alyamovskaya N., Zakhodiakin G., International Journal of Production Research 2025 Vol. 63 No. 18 P. 6630–6647
This article investigates the application of reinforcement learning (RL) methods to optimise a four-echelon linear supply chain model with stochastic demand. The proposed supply chain configuration is largely based on the production-distribution supply chain of the MIT Supply Chain Beer Game. We show that RL can significantly improve ordering efficiency and overall supply chain performance. ...
Added: March 24, 2025
Optimization of the Accelerator Control by Reinforcement Learning: A Simulation-Based Approach
Ibrahim A., Derkach D., Petrenko A. et al., Physics of Particles and Nuclei 2025 Vol. 56 No. 6 P. 1476–1481
Optimizing accelerator control is a critical challenge in experimental particle physics, requiring significant manual effort and resource expenditure. Traditional tuning methods are often time-consuming and reliant on expert input, highlighting the need for more efficient approaches. This study aims to create a simulation-based framework integrated with Reinforcement Learning (RL) to address these challenges. Using \texttt{Elegant} ...
Added: March 16, 2025
Компьютерное моделирование аффективных процессов в когнитивном контроле
Баланина С. Н., Berezner T., В кн.: Психология познания: материалы Всероссийской научной конференции. ЯрГУ, 6–8 декабря 2024 г. Материалы Всероссийской научной конференции памяти Дж. С. Брунера.: Яр.: ЯрГУ им. П. Г. Демидова, 2024. С. 45–48.
В настоящей работе мы предложили метод моделирования эмоциональной реакции, вызываемой стимулами в задаче Струпа. Наша модель отражает изменение валентности вызываемой реакции, то есть аффективной оценки стимула, по мере прохождения эксперимента. Мы использовали модель из класса алгоритмов обучения с подкреплением, разработанную Silvetti et al. (Silvetti et al., 2018). Результаты симуляции подтвердили, что вначале аффективная оценка выше ...
Added: December 28, 2024
Comparing experience- and description-based economic preferences across 11 countries
Anlló H., Bavard S., Benmarrakchi F. et al., Nature Human Behaviour 2024 Vol. 6 No. 8 P. 1554–1567
Recent evidence indicates that reward value encoding in humans is highly context dependent, leading to suboptimal decisions in some cases, but whether this computational constraint on valuation is a shared feature of human cognition remains unknown. Here we studied the behaviour of n = 561 individuals from 11 countries of markedly different socioeconomic and cultural makeup. Our findings ...
Added: July 17, 2024
Теория массового обслуживания
Ivchenko G., Kashtanov V., Коваленко И. Н., М.: Издательская группа URSS, 2022.
В настоящем пособии излагаются элементы основных направлений ТМО. Представлена общая характеристика СМО; выделены такие разделы теории, как асимптотичесике методы, приоритетеные системы, статистика СМО и моделирование СМО. Второе издание книги включает дополнение, посвященное описанию полумарковских процессов. Глава, посвященная марковским моделям массового обслуживания, дополнена параграфом, в еотором рассматривается система с повторными вызовами. ...
Added: January 4, 2023
Обзор нейросетевых методов анализа и генерации кода
С. М. Авдошин, Г. А. Арутюнов, Информационные технологии 2022 Т. 28 № 7 С. 378–391
The global pandemic has outlined the shortfall of human resources in the information technology sector. On the estimation of analysts, the labor shortage of IT-specialists in Russia in 2021 is between 500 thousand and 1 million people. Educating and bringing to market such numerous personnel may take years. The task of optimizing the process of ...
Added: June 11, 2022
  • About
  • About
  • Key Figures & Facts
  • Sustainability at HSE University
  • Faculties & Departments
  • International Partnerships
  • Faculty & Staff
  • HSE Buildings
  • HSE University for Persons with Disabilities
  • Public Enquiries
  • Studies
  • Admissions
  • Programme Catalogue
  • Undergraduate
  • Graduate
  • Exchange Programmes
  • Summer University
  • Summer Schools
  • Semester in Moscow
  • Business Internship
  • Research
  • International Laboratories
  • Research Centres
  • Research Projects
  • Monitoring Studies
  • Conferences & Seminars
  • Academic Jobs
  • Yasin (April) International Academic Conference on Economic and Social Development
  • Media & Resources
  • Publications by staff
  • HSE Journals
  • Publishing House
  • iq.hse.ru: commentary by HSE experts
  • Library
  • Economic & Social Data Archive
  • Video
  • HSE Repository of Socio-Economic Information
  • HSE1993–2026
  • Contacts
  • Copyright
  • Privacy Policy
  • Site Map
Edit