Toward Understanding State Representation Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We study the problem of representation learning for control from partial and potentially high-dimensional observations. We approach this problem via direct latent model learning, where one directly learns a dynamical model in some latent state space by predicting costs. In particular, we establish finite-sample guarantees of finding a near-optimal representation function and a near-optimal controller using the directly learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control. A part of our approach to latent model learning closely resembles MuZero, a recent breakthrough in empirical reinforcement learning, in that it learns latent dynamics implicitly by predicting cumulative costs. A key technical contribution of this work is to prove persistency of excitation for a new stochastic process that arises from the analysis of quadratic regression in our approach.

Original languageEnglish
Title of host publication2023 62nd IEEE Conference on Decision and Control, CDC 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages6166-6171
Number of pages6
ISBN (Electronic)9798350301243
DOIs
StatePublished - 2023
Externally publishedYes
Event62nd IEEE Conference on Decision and Control, CDC 2023 - Singapore, Singapore
Duration: 13 Dec 202315 Dec 2023

Publication series

NameProceedings of the IEEE Conference on Decision and Control
ISSN (Print)0743-1546
ISSN (Electronic)2576-2370

Conference

Conference62nd IEEE Conference on Decision and Control, CDC 2023
Country/TerritorySingapore
CitySingapore
Period13/12/2315/12/23

Fingerprint

Dive into the research topics of 'Toward Understanding State Representation Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control'. Together they form a unique fingerprint.

Cite this