Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments

Zhenshan Bing, Lukas Knak, Long Cheng, Fabrice O. Morin, Kai Huang, Alois Knoll

Research output: Contribution to journalArticlepeer-review

6 Scopus citations

Abstract

Recent state-of-the-art artificial agents lack the ability to adapt rapidly to new tasks, as they are trained exclusively for specific objectives and require massive amounts of interaction to learn new skills. Meta-reinforcement learning (meta-RL) addresses this challenge by leveraging knowledge learned from training tasks to perform well in previously unseen tasks. However, current meta-RL approaches limit themselves to narrow parametric and stationary task distributions, ignoring qualitative differences and nonstationary changes between tasks that occur in the real world. In this article, we introduce a <bold>T</bold>ask-<bold>I</bold>nference-based meta-RL algorithm using explicitly parameterized <bold>G</bold>aussian variational autoencoders (VAEs) and gated <bold>R</bold>ecurrent units (TIGR), designed for nonparametric and nonstationary environments. We employ a generative model involving a VAE to capture the multimodality of the tasks. We decouple the policy training from the task-inference learning and efficiently train the inference mechanism on the basis of an unsupervised reconstruction objective. We establish a zero-shot adaptation procedure to enable the agent to adapt to nonstationary task changes. We provide a benchmark with qualitatively distinct tasks based on the <italic>half-cheetah</italic> environment and demonstrate the superior performance of TIGR compared with state-of-the-art meta-RL approaches in terms of sample efficiency (three to ten times faster), asymptotic performance, and applicability in nonparametric and nonstationary environments with zero-shot adaptation. Videos can be viewed at https://videoviewsite.wixsite.com/tigr.

Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
StateAccepted/In press - 2023

Keywords

  • Adaptation models
  • Gaussian variational autoencoder (VAE)
  • Probabilistic logic
  • Robots
  • Switches
  • Task analysis
  • Training
  • Turning
  • meta-reinforcement learning (meta-RL)
  • robotic control
  • task adaptation
  • task inference

Fingerprint

Dive into the research topics of 'Meta-Reinforcement Learning in Nonstationary and Nonparametric Environments'. Together they form a unique fingerprint.

Cite this