Dealing With Sparse Rewards Using Graph Neural Networks

Gerasyov Matvey; I. Makarov

doi:10.1109/ACCESS.2023.3305927

Publications

?

Dealing With Sparse Rewards Using Graph Neural Networks

IEEE Access. 2023. Vol. 11. P. 89180–89187.

Gerasyov Matvey, Makarov I.

Deep reinforcement learning in partially observable environments is a difficult task in itself and can be further complicated by a sparse reward signal. Most tasks involving navigation in three-dimensional environments provide the agent with minimal information. Typically, the agent receives a visual observation input from the environment and is rewarded once at the end of the episode. A good reward function could substantially improve the convergence of reinforcement learning algorithms for such tasks. The classic approach to increasing the density of the reward signal is to augment it with supplementary rewards. This technique is called reward shaping. In this study, we propose two modifications of one of the recent reward shaping methods based on graph convolutional networks: the first involving advanced aggregation functions, and the second utilizing the attention mechanism. We empirically validate the effectiveness of our solutions for the task of navigation in a 3D environment with sparse rewards. For the solution featuring the attention mechanism, we can also show that the learned attention is concentrated on edges corresponding to important transitions in the 3D environment.

Research target: Computer Science

Keywords: graph neural networks deep reinforcement learning partially observable Markov decision process reward shaping

Publication based on the results of:

Efficient algorithms for computer vision and facial image processing (2023)

When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding

Skrynnik A., Andreychuk A., Yakovlev K. et al., IEEE Transactions on Neural Networks and Learning Systems 2023 P. 1–14

Multi-agent pathfinding (MAPF) is a problem that involves finding a set of non-conflicting paths for a set of agents confined to a graph. In this work, we study a MAPF setting, where the environment is only partially observable for each agent, i.e., an agent observes the obstacles and other agents only within a limited field-of-view. ...

Added: December 4, 2023

ST-GRAT: A Novel Spatio-Temporal Graph Attention Networks for Accurately Forecasting Dynamically Changing Road Speed

Park C., Lee C., Bahng H. et al., CIKM: ACM International Conference on Information & Knowledge Management (США) 2020 P. 1215–1224

Predicting road traffic speed is a challenging task due to different types of roads, abrupt speed change and spatial dependencies between roads; it requires the modeling of dynamically changing spatial dependencies among roads and temporal patterns over long input sequences. This paper proposes a novel spatio-temporal graph attention (ST-GRAT) that effectively captures the spatio-temporal dynamics ...

Added: May 18, 2023

Comparative Analysis of Logic Reasoning and Graph Neural Networks for Ontology-Mediated Query Answering with a Covering Axiom

Gerasimova O., Makarov I., Severin N., IEEE Access 2023 Vol. 11 P. 88074–88086

The problem of query answering over incomplete attributed graph data is a challenging field of database management systems and artificial intelligence. When there are rules on data structure expressed in the form of the ontology, the theoretical complexity of finding exact solution satisfying ontology constraints increases. Logic-based methods use theoretical constructions to obtain efficient rewritings ...

Added: January 5, 2024

Exploration in Sequential Recommender Systems via Graph Representations

Kiselev D., Makarov I., IEEE Access 2022 Vol. 10 P. 123614–123621

Temporal graph networks are powerful tools for solving the cold-start problem in sequential recommender systems. However, graph models are susceptible to feedback loops and data distribution shifts. The paper proposes a simple yet efficient graph-based exploration method for the mitigation of the issues above. It adopts the counter-based state exploration from reinforcement learning to the ...

Added: September 5, 2022

ICML 2022 Workshop: Principles of Distribution Shift (PODS)

[б.и.], 2022.

The problem of out-of-distribution detection for graph classification is far from being solved. The existing models tend to be overconfident about OOD examples or completely ignore the detection task. In this work, we consider this problem from the uncertainty estimation perspective and perform the comparison of several recently proposed methods. In our experiment, we find ...

Added: December 12, 2023

Обзор нейросетевых методов анализа и генерации кода

С. М. Авдошин, Г. А. Арутюнов, Информационные технологии 2022 Т. 28 № 7 С. 378–391

The global pandemic has outlined the shortfall of human resources in the information technology sector. On the estimation of analysts, the labor shortage of IT-specialists in Russia in 2021 is between 500 thousand and 1 million people. Educating and bringing to market such numerous personnel may take years. The task of optimizing the process of ...

Added: June 11, 2022

21st IEEE International Conference on Data Mining Workshops, ICDMW 2021

IEEE Computer Society, 2021.

The 21th IEEE International Conference on Data Mining (IEEE ICDM 2021) is a premier and truly international conference for researchers and practitioners in the broad area of data mining. The ICDM Workshops program (IEEE ICDMW) aims to provide a platform for multiple workshops with a range of more focused topics to be discussed and explored, where attendees can present ...

Added: February 4, 2022

Analysis of the Transmission Spectra of Optical Microcavities Using the Mode Broadening Method

Ружицкая Д. Д., САМОЙЛЕНКО А. А., Иванов А. Д. et al., Optoelectronics, Instrumentation and Data Processing 2017 Vol. 54 No. 1 P. 1–8

This paper presents an algorithm for processing the transmission spectra of whisperinggallery optical microcavities for use as a nanoparticle detector. The algorithm is based on the broadening of the microcavity resonance curve during precipitation of nanoparticles on the microcavity surface. Experimental results on the detection of particles are compared with Langmuir adsorption theory. The contribution ...

Added: May 25, 2018

Операционные системы. Учебник и практикум

Gostev I. M., М.: Юрайт, 2016.

В настоящее время компьютерные науки стремительно развиваются. Новые версии операционных систем появляются каждые полтора-два года, поэтому было принято решение о включении в данную книгу такого материала, который не будет устаревать. Содержание учебника представляет собой некоторые наиболее общие принципы построения операционных систем, которые были разработаны более 50 лет назад и практически не изменились за прошедшее время. ...

Added: October 13, 2009

Probably approximately correct learning of Horn envelopes from queries

Borchmann D., Hanika T., Obiedkov S., Discrete Applied Mathematics 2020 Vol. 273 P. 30–42

We propose an algorithm for learning the Horn envelope of an arbitrary domain using an expert, or an oracle, capable of answering certain types of queries about this domain. Attribute exploration from formal concept analysis is a procedure that solves this problem, but the number of queries it may ask is exponential in the size ...

Added: October 29, 2019

A Parallel Algorithm to Detect Structural Breaks in Time Series

Furmanov K. K., Nikol'skii I. M., Computational Mathematics and Modeling 2016 Vol. 27 No. 2 P. 247–253

Added: December 22, 2016

Оценка занятости пожарных боевых расчётов и рисков их несвоевременного прибытия на объект защиты

Litvin Y. V., Абрамов И. В., Технологии техносферной безопасности 2016 № 66

Advanced approach to the assessment of a random time of arrival fire fighting calculation on the object of protection, the time of their employment and the free combustion. There is some quantitative assessments with the review of analytical methods and simulation ...

Added: August 27, 2016

Measurement of antiproton production from antihyperon decays in pHe collisions at sqrt(sNN)=110 GeV

Aaij R., Abdelmotteleb A. S., Abellan Beteta C. et al., The European Physical Journal C - Particles and Fields 2023 Vol. 83 Article 543

https://link.springer.com/article/10.1140/epjc/s10052-023-11673-x#Abs1 ...

Added: December 4, 2023

О некоторых медленно сходящихся системах преобразований термов

Beklemishev L. D., Оноприенко А. А., Математический сборник 2015 Т. 206 № 9 С. 3–20

We formulate some term rewriting systems in which the number of computation steps is finite for each output, but this number cannot be bounded by a provably total computable function in Peano arithmetic PA. Thus, the termination of such systems is unprovable in PA. These systems are derived from an independent combinatorial result known as the Worm ...

Added: March 13, 2016

Сборник трудов конференции NI Academic Days 2017, Москва 13-14 апреля 2017 г.

М.: National Instruments Russia, 2017.

Содержание сборника составляют доклады с результатами оригинальных исследований и технических решений, ранее не публиковавшиеся. Мы надеемся, что предлагаемый сборник окажется полезным для специалистов, работающих в различных областях науки и техники, для широкого круга преподавателей, аспирантов и студентов ВУЗов, а также для преподавателей средних школ и технических колледжей. ...

Added: May 10, 2017

Self-Organization in Network Sociotechnical Systems

Svetlana Maltseva, Kornilov V., Barakhnin V. et al., Complexity 2022 Vol. 2022 Article 5714395

We can observe self-organization properties in various systems. However, modern networked dynamical sociotechnical systems have some features that allow for realizing the benefits of self-organization in a wide range of systems in economic and social areas. The review examines the general principles of self-organized systems, as well as the features of the implementation of self-organization ...

Added: April 18, 2022

Database on the Bandgap of Inorganic Substances and Materials

Kiselyova N. N., Dudarev V.A., Korzhuev M. A., Inorganic Materials: Applied Research 2016 Vol. 7 No. 1 P. 34–39

A database (DB) on the bandgap of inorganic substances available via the Internet (http://bg.imetdb.ru) was developed for the information service of specialists in the sphere of inorganic chemistry and materials science. The DB is integrated with other information systems on the properties of inorganic substances and materials, which provides the search of a wide range ...

Added: February 23, 2016

Система менеджмента знаний в стратегическом управлении университетом

Dneprovskaya N., Шевцова И. В., Бизнес-информатика 2023 Т. 17 № 2 С. 20–40

The purpose of this study is a conceptual description of the implementation of knowledge management systems (KMS) as a mechanism for universities’ strategic development. Knowledge management (KM) practice from around the world proved the positive influence of KMS on productivity of educational institutions. The theoretical provisions and concept for KMS are determined based on an ...

Added: August 2, 2023

Технологии, измерения и испытания в области электромагнитной совместимости. Труды VII Всероссийской НТК «ТехноЭМС-2020»

М.: Грифон, 2020.

В сборнике приведены материалы VII Всероссийской конференции «ТехноЭМС-2020», посвященной технологии, измерениям и испытаниям в области электромагнитной совместимости. Сборник предназначен для специалистов в области проектирования технических средств, электромагнитной совместимости, а также занимающихся испытаниями и измерениями в этой об-ласти. ...

Added: May 7, 2020

First Measurement of the Z → μ + μ − Angular Coefficients in the Forward Region of pp Collisions at √s = 13 TeV

Aaij R., Abdelmotteleb A. S., Abellán Beteta C. et al., Physical Review Letters 2022 Vol. 129 No. 9 Article 091801

The first study of the angular distribution of μþμ− pairs produced in the forward rapidity region via the Drell-Yan reaction pp → γ=Z þ X → lþl− þ X is presented, using data collected with the LHCb detector at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 5.1 fb−1. The coefficients ...

Added: December 29, 2022

Компьютерная лингвистика и интеллектуальные технологии: По материалам ежегодной международной конференции «Диалог» (Москва, 29 мая — 1 июня 2019 г.). Вып. 18 (25)

М.: Издательский центр «Российский государственный гуманитарный университет», 2019.

Сборник включает 27 докладов международной конференции по компьютерной лингвистике и интеллектуальным технологиям «Диалог 2019», не вошедшие в ежегодник «Компьютерная лингвистика и интеллектуальные технологии», но рекомендованные Программным Комитетом к представлению на конференции. Для специалистов в области теоретической и прикладной лингвистики и интеллектуальных технологий. ...

Added: December 10, 2019

Formation of Control Structures in Static Swarms

Karpov V. E., Karpova I. P., Procedia Engineering 2015 Vol. 100 P. 1459–1468

Work solutions are proposed for problems of leader definition and role distribution in homogeneous groups of robots. It is shown that transition from a swarm to a collective of robots with hierarchical organization is possible using exclusively local interaction. The local revoting algorithm is central to the procedure for choice of leader while redistribution of roles can ...

Added: March 14, 2015

О выборе программных средств когнитивной компьютерной визуализации

Baibikova T., Domoratsky E., Вестник Московского финансово-юридического университета 2017 № 1 С. 200–206

Some questions of scientific visualization are under consideration in this paper. This article also discusses the peculiarities of application of cognitive computer graphics, singles out a range of tasks of scientific visualization. The paper gives a brief overview of modern support tools for program visualization, tendencies of their development and their main characteristics. A module ...

Added: June 10, 2017

Об одномерных проекциях многогранников задач дискретной оптимизации

Vyalyi M., Дискретная математика 1991 Т. 3 № 3 С. 35–45

Added: October 17, 2014