A Minimal Model for Compositional Generalization on gSCAN

Alice Hein, Klaus Diepold

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Whether neural networks are capable of compositional generalization has been a topic of much debate. Most previous studies on this subject investigate the generalization capabilities of state-of-the-art deep learning architectures. We here take a more bottom-up approach and design a minimal model that displays generalization on a compositional benchmark, namely, the gSCAN dataset. The model is a hybrid architecture that combines layers trained with gradient descent and a selective attention mechanism optimized with an evolutionary strategy. The architecture has around 60 times fewer trainable parameters than models previously tested on gSCAN, and achieves comparable accuracies on most test splits, even when trained only on a fraction of the dataset. On adverb to verb generalization accuracy, it outperforms previous approaches by 65 to 86%. Through ablation studies, neuron pruning, and error analyses, we show that weight decay and attention mechanisms facilitate compositional generalization by encouraging sparse representations divorced from irrelevant context. We find that the model’s sample efficiency can mainly be attributed to its selective attention mechanism.

Original languageEnglish
Title of host publicationBlackboxNLP 2022 - BlackboxNLP Analyzing and Interpreting Neural Networks for NLP, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages1-15
Number of pages15
ISBN (Electronic)9781959429050
StatePublished - 2022
Event5th Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP 2022 hosted by the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 8 Dec 2022 → …

Publication series

NameBlackboxNLP 2022 - BlackboxNLP Analyzing and Interpreting Neural Networks for NLP, Proceedings of the Workshop

Conference

Conference5th Workshop on Analyzing and Interpreting Neural Networks for NLP, BlackboxNLP 2022 hosted by the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period8/12/22 → …

Fingerprint

Dive into the research topics of 'A Minimal Model for Compositional Generalization on gSCAN'. Together they form a unique fingerprint.

Cite this