TY - JOUR
T1 - Meta-Reinforcement Learning in Non-Stationary and Dynamic Environments
AU - Bing, Zhenshan
AU - Lerch, David
AU - Huang, Kai
AU - Knoll, Alois
N1 - Publisher Copyright:
© 1979-2012 IEEE.
PY - 2023/3/1
Y1 - 2023/3/1
N2 - In recent years, the subject of deep reinforcement learning (DRL) has developed very rapidly, and is now applied in various fields, such as decision making and control tasks. However, artificial agents trained with RL algorithms require great amounts of training data, unlike humans that are able to learn new skills from very few examples. The concept of meta-reinforcement learning (meta-RL) has been recently proposed to enable agents to learn similar but new skills from a small amount of experience by leveraging a set of tasks with a shared structure. Due to the task representation learning strategy with few-shot adaptation, most recent work is limited to narrow task distributions and stationary environments, where tasks do not change within episodes. In this work, we address those limitations and introduce a training strategy that is applicable to non-stationary environments, as well as a task representation based on Gaussian mixture models to model clustered task distributions. We evaluate our method on several continuous robotic control benchmarks. Compared with state-of-the-art literature that is only applicable to stationary environments with few-shot adaption, our algorithm first achieves competitive asymptotic performance and superior sample efficiency in stationary environments with zero-shot adaption. Second, our algorithm learns to perform successfully in non-stationary settings as well as a continual learning setting, while learning well-structured task representations. Last, our algorithm learns basic distinct behaviors and well-structured task representations in task distributions with multiple qualitatively distinct tasks.
AB - In recent years, the subject of deep reinforcement learning (DRL) has developed very rapidly, and is now applied in various fields, such as decision making and control tasks. However, artificial agents trained with RL algorithms require great amounts of training data, unlike humans that are able to learn new skills from very few examples. The concept of meta-reinforcement learning (meta-RL) has been recently proposed to enable agents to learn similar but new skills from a small amount of experience by leveraging a set of tasks with a shared structure. Due to the task representation learning strategy with few-shot adaptation, most recent work is limited to narrow task distributions and stationary environments, where tasks do not change within episodes. In this work, we address those limitations and introduce a training strategy that is applicable to non-stationary environments, as well as a task representation based on Gaussian mixture models to model clustered task distributions. We evaluate our method on several continuous robotic control benchmarks. Compared with state-of-the-art literature that is only applicable to stationary environments with few-shot adaption, our algorithm first achieves competitive asymptotic performance and superior sample efficiency in stationary environments with zero-shot adaption. Second, our algorithm learns to perform successfully in non-stationary settings as well as a continual learning setting, while learning well-structured task representations. Last, our algorithm learns basic distinct behaviors and well-structured task representations in task distributions with multiple qualitatively distinct tasks.
KW - Gaussian mixture model
KW - Meta-reinforcement learning
KW - robotic control
KW - task adaptation
KW - task inference
UR - http://www.scopus.com/inward/record.url?scp=85133558303&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2022.3185549
DO - 10.1109/TPAMI.2022.3185549
M3 - Article
AN - SCOPUS:85133558303
SN - 0162-8828
VL - 45
SP - 3476
EP - 3491
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
IS - 3
ER -