TY - JOUR
T1 - In-Context Symmetries
T2 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024
AU - Gupta, Sharut
AU - Wang, Chenyu
AU - Wang, Yifei
AU - Jaakkola, Tommi
AU - Jegelka, Stefanie
N1 - Publisher Copyright:
© 2024 Neural information processing systems foundation. All rights reserved.
PY - 2024
Y1 - 2024
N2 - At the core of self-supervised learning for vision is the idea of learning invariant or equivariant representations with respect to a set of data transformations. This approach, however, introduces strong inductive biases, which can render the representations fragile in downstream tasks that do not conform to these symmetries. In this work, drawing insights from world models, we propose to instead learn a general representation that can adapt to be invariant or equivariant to different transformations by paying attention to context - a memory module that tracks task-specific states, actions, and future states. Here, the action is the transformation, while the current and future states respectively represent the input's representation before and after the transformation. Our proposed algorithm, Contextual Self-Supervised Learning (CONTEXTSSL), learns equivariance to all transformations (as opposed to invariance). As a result, the model learns to encode all relevant features as general representations while, importantly, it can adapt to restrict to task-wise symmetries when given a few examples as the context. Empirically, we demonstrate significant performance gains over existing methods on equivariance-related tasks, supported by both qualitative and quantitative evaluations. Code is available at https://github.com/Sharut/In-Context-Symmetries.
AB - At the core of self-supervised learning for vision is the idea of learning invariant or equivariant representations with respect to a set of data transformations. This approach, however, introduces strong inductive biases, which can render the representations fragile in downstream tasks that do not conform to these symmetries. In this work, drawing insights from world models, we propose to instead learn a general representation that can adapt to be invariant or equivariant to different transformations by paying attention to context - a memory module that tracks task-specific states, actions, and future states. Here, the action is the transformation, while the current and future states respectively represent the input's representation before and after the transformation. Our proposed algorithm, Contextual Self-Supervised Learning (CONTEXTSSL), learns equivariance to all transformations (as opposed to invariance). As a result, the model learns to encode all relevant features as general representations while, importantly, it can adapt to restrict to task-wise symmetries when given a few examples as the context. Empirically, we demonstrate significant performance gains over existing methods on equivariance-related tasks, supported by both qualitative and quantitative evaluations. Code is available at https://github.com/Sharut/In-Context-Symmetries.
UR - http://www.scopus.com/inward/record.url?scp=105000472681&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:105000472681
SN - 1049-5258
VL - 37
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
Y2 - 9 December 2024 through 15 December 2024
ER -