PrivaT5: A Generative Language Model for Privacy Policies

Mohammad Al Zoubi, T. Y.S.S. Santosh, Edgar Ricardo Chavez Rosas, Matthias Grabmair

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In the era of of digital privacy, users often neglect to read privacy policies due to their complexity. To bridge this gap, NLP models have emerged to assist in understanding privacy policies. While recent generative language models like BART and T5 have shown prowess in text generation and discriminative tasks being framed as generative ones, their application to privacy policy domain tasks remains unexplored. To address that, we introduce PrivaT5, a T5-based model that is further pre-trained on privacy policy text. We evaluate PrivaT5 over a diverse privacy policy related tasks and notice its superior performance over T5, showing the utility of continued domain-specific pretraining. Our results also highlight challenges faced by these generative models in complex structured output label space, especially in sequence tagging tasks, where they fall short compared to lighter encoder-only models.

Original languageEnglish
Title of host publicationPrivateNLP 2024 - 5th Workshop on Privacy in Natural Language Processing, Proceedings of the Workshop
EditorsIvan Habernal, Sepideh Ghanavati, Abhilasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah, Oluwaseyi Feyisetan
PublisherAssociation for Computational Linguistics (ACL)
Pages159-169
Number of pages11
ISBN (Electronic)9798891761391
StatePublished - 2024
Event5th Workshop on Privacy in Natural Language Processing, PrivateNLP 2024 - Co-located with ACL 2024 - Bangkok, Thailand
Duration: 15 Aug 2024 → …

Publication series

NamePrivateNLP 2024 - 5th Workshop on Privacy in Natural Language Processing, Proceedings of the Workshop

Conference

Conference5th Workshop on Privacy in Natural Language Processing, PrivateNLP 2024 - Co-located with ACL 2024
Country/TerritoryThailand
CityBangkok
Period15/08/24 → …

Fingerprint

Dive into the research topics of 'PrivaT5: A Generative Language Model for Privacy Policies'. Together they form a unique fingerprint.

Cite this