Séminaire GIGL : Temporal credit assignment in RNNs and improved RNN training for model-based reinforcement learning

jeudi 21 février 2019

11h30 à 12h20

Polytechnique Montréal - Pavillon Lassonde

L-4812

2700, chemin de la Tour

Montréal (QC) Canada H3T 1J4

Titre: Temporal credit assignment in RNNs and improved RNN training for model-based RL

Conférencière : Rosemary Nan Ke

Résumé: Recurrent neural networks (RNNs) are an important class of models that models sequences. I will talk about 2 of my recent work on RNN training. The first part will cover temporal credit assignment in RNNs and the second part covers how to useRNNs to predict the long term future in model-based RL.

Recurrent neural networks are usually trained using backpropagation through time (BPTT). This requires credit information to be propagated backwards through every single step of the forward computation, potentially over thousands or millions of time steps. This becomes computationally expensive or even infeasible when used with long sequences. Importantly, biological brains are unlikely to perform such detailed reverse replay over very long sequences of internal states (consider days, months,or years.) However, humans are often reminded of past memories or mental states which are associated with the current mental state. We consider the hypothesis that such memory associations between past and present could be used for creditassignment through arbitrarily long sequences, propagating the credit assigned to the current state to the associated past state. Based on this principle, we study a novel algorithm which only back-propagates through a few of these temporal skip connections, realized by a learned attention mechanism that associates current states with relevant past states. We demonstrate in experiments that our method matches or outperforms regular BPTT and truncated BPTT in tasks involving particularly longterm dependencies, but without requiring the biologically implausible backward replay through the whole history of states. Additionally, we demonstrate that the proposed method transfers to longer sequences significantly better than LSTMs trained with BPTT and LSTMs trained with full self-attention.

In model-based reinforcement learning, the agent interleaves between model learning and planning. These two components are inextricably intertwined. If the model is not able to provide sensible long-term prediction, the executed planer would exploit model flaws, which can yield catastrophic failures. This paper focuses on building a model that reasons about the long-term future and demonstrates how to use this for efficient planning and exploration. To this end, we build a latent-variable autoregressive model by leveraging recent ideas in variational inference. We argue that forcing latent variables to carry future information through an auxiliary task substantially improves long-term predictions. Moreover, by planning in the latent space, the planner's solution is ensured to be within regions where the model is valid. An exploration strategy can be devised by searching for unlikely trajectories under the model. Our methods achieves higher reward faster compared to baselines on a variety of tasks and environments in both the imitation learning and model-based reinforcement learning settings.

Bio: Rosemary Nan Ke is a PhD student at Polytechnique Montreal and a member of the Mila. She has received awards from NVIDIA for her work and she is the recipient of a prestigious Facebook fellowship for graduate studies.

Bienvenue à tous!