TY - GEN
T1 - PrivaT5
T2 - 5th Workshop on Privacy in Natural Language Processing, PrivateNLP 2024 - Co-located with ACL 2024
AU - Al Zoubi, Mohammad
AU - Santosh, T. Y.S.S.
AU - Rosas, Edgar Ricardo Chavez
AU - Grabmair, Matthias
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - In the era of of digital privacy, users often neglect to read privacy policies due to their complexity. To bridge this gap, NLP models have emerged to assist in understanding privacy policies. While recent generative language models like BART and T5 have shown prowess in text generation and discriminative tasks being framed as generative ones, their application to privacy policy domain tasks remains unexplored. To address that, we introduce PrivaT5, a T5-based model that is further pre-trained on privacy policy text. We evaluate PrivaT5 over a diverse privacy policy related tasks and notice its superior performance over T5, showing the utility of continued domain-specific pretraining. Our results also highlight challenges faced by these generative models in complex structured output label space, especially in sequence tagging tasks, where they fall short compared to lighter encoder-only models.
AB - In the era of of digital privacy, users often neglect to read privacy policies due to their complexity. To bridge this gap, NLP models have emerged to assist in understanding privacy policies. While recent generative language models like BART and T5 have shown prowess in text generation and discriminative tasks being framed as generative ones, their application to privacy policy domain tasks remains unexplored. To address that, we introduce PrivaT5, a T5-based model that is further pre-trained on privacy policy text. We evaluate PrivaT5 over a diverse privacy policy related tasks and notice its superior performance over T5, showing the utility of continued domain-specific pretraining. Our results also highlight challenges faced by these generative models in complex structured output label space, especially in sequence tagging tasks, where they fall short compared to lighter encoder-only models.
UR - http://www.scopus.com/inward/record.url?scp=85204468886&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85204468886
T3 - PrivateNLP 2024 - 5th Workshop on Privacy in Natural Language Processing, Proceedings of the Workshop
SP - 159
EP - 169
BT - PrivateNLP 2024 - 5th Workshop on Privacy in Natural Language Processing, Proceedings of the Workshop
A2 - Habernal, Ivan
A2 - Ghanavati, Sepideh
A2 - Ravichander, Abhilasha
A2 - Jain, Vijayanta
A2 - Thaine, Patricia
A2 - Igamberdiev, Timour
A2 - Mireshghallah, Niloofar
A2 - Feyisetan, Oluwaseyi
PB - Association for Computational Linguistics (ACL)
Y2 - 15 August 2024
ER -