Artificial Intelligence for Prosthetics: Challenge Solutions
In this work, we study the effect of combining existent improvements for Deep Q-Networks (DQN) in Markov Decision Processes (MDP) and Partially Observable MDP (POMDP) settings. Combinations of several heuristics, such as Distributional Learning and Dueling architectures improvements, for MDP are well-studied. We propose a new combination method of simple DQN extensions and develop a new model-free reinforcement learning agent, which works with POMDP and uses well-studied improvements from fully observable MDP. To test our agent we choose the VizDoom environment, which is old first person shooter, and the Health Gathering scenario. We prove that improvements used in MDP setting may be used in POMDP setting as well and our combined agents can converge to better policies. We develop an agent with combination of several improvements showing superior game performance in practice. We compare our agent with Recurrent DQN using Prioritized Experience Replay and Snaphot Ensembling agent and get approximately triple increase in per episode reward.