Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data

A. Shpilman; Malysheva A.; Kudenko D.

doi:10.1109/ICARCV.2018.8581310

Публикации

?

Learning to Run with Potential-Based Reward Shaping and Demonstrations from Video Data

P. 286–291.

Шпильман А. А., Malysheva A., Kudenko D.

Learning to produce efficient movement behaviour for humanoid robots from scratch is a hard problem, as has been illustrated by the “Learning to run” competition at NIPS 2017. The goal of this competition was to train a two-legged model of a humanoid body to run in a simulated race course with maximum speed. All submissions took a tabula rasa approach to reinforcement learning (RL) and were able to produce relatively fast, but not optimal running behaviour. In this paper, we demonstrate how data from videos of human running (e.g. taken from YouTube) can be used to shape the reward of the humanoid learning agent to speed up the learning and produce a better result. Specifically, we are using the positions of key body parts at regular time intervals to define a potential function for potential-based reward shaping (PBRS). Since PBRS does not change the optimal policy, this approach allows the RL agent to overcome sub-optimalities in the human movements that are shown in the videos. We present experiments in which we combine selected techniques from the top ten approaches from the NIPS competition with further optimizations to create an high-performing agent as a baseline. We then demonstrate how video-based reward shaping improves the performance further, resulting in an RL agent that runs twice as fast as the baseline in 12 hours of training. We furthermore show that our approach can overcome sub-optimal running behaviour in videos, with the learned policy significantly outperforming that of the running agent from the video.

Язык: английский

DOI

Текст на другом сайте

Ключевые слова: optimization Training Humanoid Robots task analysis

В книге

2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV)

IEEE, 2018.

NeurIPS 2024 Optimization for ML Workshop

[б.и.], 2025.

Добавлено: 5 февраля 2026 г.

Method of Automated Dataset Collection for Microwave Filters Synthesis

Arinin O. V., Bakhmach D. M., Кацнельсон А. И. и др., , in: 2025 Systems of Signals Generating and Processing in the Field of on Board Communications.: IEEE, 2025. P. 1–5.

Добавлено: 6 декабря 2025 г.

Physics-Informed Bayesian Optimization for Conformational Ensemble Augmentation

Медведев М. Г., Journal of Chemical Information and Modeling 2025 Vol. 65 No. 12

Добавлено: 12 ноября 2025 г.

Optimization of Multi-Currency Deposit Structure by Two Indicators (Income and Risk) under Uncertainty

Молоствов В. С., Advances in Systems Science and Applications 2025 Vol. 25 No. 1 P. 1–11

Добавлено: 26 августа 2025 г.

Численная оптимизация проверочной матрицы LDPC-кода для применения в протоколе квантового распределения ключей с использованием высокопараллельных вычислений

Морозов В. И., Башара В. О., Емельяненко М. В., В кн.: Параллельные вычислительные технологии – XIX всероссийская научная конференция с международным участием, ПаВТ’2025, г. Москва, 8–10 апреля 2025 г. Короткие статьи и описания плакатов.: Челябинск: Издательский центр ЮУрГУ, 2025. С. 193–210.

Исправление ошибок в секретном ключе является обязательным этапом протоколов квантового распределения ключей (КРК). Для его реализации, как правило, используются современные помехоустойчивые коды. Несовершенство аппаратуры, используемой в системах КРК, приводит к появлению битовых ошибок в канале. Более того, для подобных систем характерно несимметричное распределение таких ошибок. Учет такой асимметрии в модели канала не только позволяет повысить ...

Добавлено: 3 июня 2025 г.

Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 3-5 May 2025, Splash Beach Resort in Mai Khao, Thailand, PMLR: vol. 258

PMLR, 2025.

Добавлено: 18 мая 2025 г.

Editorial

Panos Pardalos, Valery Kalyagin, Mario R. Guarracino, Computational Management Science 2024 Vol. 21 No. 1 Article 35

Big data has become an integral part of modern networks. With the increasing amount of data generated by devices, machines, and applications, networks are constantly being challenged to handle and process this data in a timely and efficient manner. The size, complexity, and variety of data in networks are increasing rapidly, which requires new approaches ...

Добавлено: 22 февраля 2025 г.

Savage's Solution to the Problem of Three-Currency Deposit Diversification: Program Tools and Modeling Results

Молоствов В. С., Advances in Systems Science and Applications 2024 Vol. 24 No. 2 P. 103–115

This paper presents the development of computing tools for finding optimal structures of multi-currency deposits in terms of guaranteed risk under uncertain exchange rates. The approach utilizes Savage's minimax regret concept to calculate risk and guaranteed risk functions explicitly, assuming only the limits of possible changes in uncertain parameters are known. The Excel environment implements ...

Добавлено: 9 августа 2024 г.

Влияние цифровых технологий на бизнес-процессы и конкурентные преимущества FMCG-компаний в Казахстане

Сизов М. В., Шушкин М. А., Информационное общество 2024 № 6 С. 2–15

Настоящее исследование посвящено изучению влияния цифровых технологий на традиционные бизнес-процессы и конкурентные преимущества компаний Казахстана в секторе FMCG, а также их вклада в инновационность и производительность. В работе применялся системный анализ литературы по теме исследования, результаты которого показали, что цифровые бизнес-процессы имеют ряд уникальных характеристик, в том числе улучшение внутренних и внешних коммуникаций, использование прогнозной ...

Добавлено: 14 мая 2024 г.

Random beliefs in Cournout competition

Dranov E., Федянин Д. Н., , in: 2023 5th International Conference on Control Systems, Mathematical Modeling, Automation and Energy Efficiency (SUMMA)Vol. 5.: IEEE, 2023. P. 464–469.

Добавлено: 24 февраля 2024 г.

14th International Conference, OPTIMA 2023, Petrovac, Montenegro, September 18–22, 2023, Revised Selected Papers. Communications in Computer and Information Science (CCIS, volume 1913)

Springer, 2023.

Добавлено: 9 декабря 2023 г.

Data Analysis and Optimization. In honor of Boris Mirkin’s 80th birthday

Cham: Springer, 2023.

Добавлено: 3 ноября 2023 г.

PSIICOS projection optimality for EEG and MEG based functional coupling detection

Алтухов Д. И., Клеева Д. Ф., Осадчий А. Е., Neuroimage 2023 Vol. 280 Article 120333

Добавлено: 24 сентября 2023 г.

Тестирование методов обмена данными между процессами на суперкомпьютере JETSON TX2 в сравнении с другими платформами

Смирнов И. А., КРАВЧЕНКО В. О., Разумов П. В. и др., ГНИИ "НацРазвитие", 2019.

в данной статье будут рассмотрены различные методы обмена данными между процессами, с последующим выводом о быстродействии каждого. Тесты будут проводиться на разных процессорах и разных версиях операционных систем. Это исследование проводилось с целью узнать самый быстрый способ передачи данных между процессами на суперкомпьютере Jetson TX2 по сравнению с другими платформами. ...

Добавлено: 11 мая 2023 г.

22nd International Conference, MMST 2022, Nizhny Novgorod, Russia, November 14–17, 2022, Revised Selected Papers

Springer, 2022.

Добавлено: 26 декабря 2022 г.

A New Interpolation-Based Polynomial Algorithm for Estimating Lateness in Single Machine Scheduling Problem

Лазарев А. А., Lemtyuzhnikova D. V., Tyunyatkin A. A. и др., IFAC-PapersOnLine 2022 Vol. 55 No. 10 P. 2881–2886

Добавлено: 5 декабря 2022 г.

Variational Autoencoders for Precoding Matrices with High Spectral Efficiency

Bobrov E., Markov A., Panchenko S. и др., , in: Mathematical Optimization Theory and Operations Research: Recent Trends. 21st International Conference, MOTOR 2022, Petrozavodsk, Russia, July 2–6, 2022, Revised Selected Papers.: Springer, 2022. Ch. 22 P. 315–326.

Добавлено: 1 ноября 2022 г.

An Achievability Bound of Energy Per Bit for Stabilized Massive Random Access Gaussian Channel

Burkov A., Shneer S., Andrey Turlikov, IEEE Communications Letters 2021 Vol. 25 No. 1 P. 299–302

Добавлено: 28 октября 2022 г.

Recent Theoretical Advances in Decentralized Distributed Convex Optimization

Горбунов Э. А., Рогозин А. В., Безносиков А. Н. и др., , in: High-Dimensional Optimization and Probability: With a View Towards Data Science.: Springer, 2022. Ch. 191 P. 253–325.

Добавлено: 28 октября 2022 г.

High-Dimensional Optimization and Probability: With a View Towards Data Science

Springer, 2022.

Добавлено: 28 октября 2022 г.

Springer Optimization and Its Applications

Springer, 2022.

Добавлено: 28 октября 2022 г.