A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Applications of Differential Privacy (DP) in NLP must distinguish between the syntactic level on which a proposed mechanism operates, often taking the form of word-level or document-level privatization. Recently, several word-level Metric Differential Privacy approaches have been proposed, which rely on this generalized DP notion for operating in word embedding spaces. These approaches, however, often fail to produce semantically coherent textual outputs, and their application at the sentence- or document-level is only possible by a basic composition of word perturbations. In this work, we strive to address these challenges by operating between the word and sentence levels, namely with collocations. By perturbing n-grams rather than single words, we devise a method where composed privatized outputs have higher semantic coherence and variable length. This is accomplished by constructing an embedding model based on frequently occurring word groups, in which unigram words co-exist with bi- and trigram collocations. We evaluate our method in utility and privacy tests, which make a clear case for tokenization strategies beyond the word level.

Original languageEnglish
Title of host publicationPrivateNLP 2024 - 5th Workshop on Privacy in Natural Language Processing, Proceedings of the Workshop
EditorsIvan Habernal, Sepideh Ghanavati, Abhilasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah, Oluwaseyi Feyisetan
PublisherAssociation for Computational Linguistics (ACL)
Pages39-51
Number of pages13
ISBN (Electronic)9798891761391
StatePublished - 2024
Event5th Workshop on Privacy in Natural Language Processing, PrivateNLP 2024 - Co-located with ACL 2024 - Bangkok, Thailand
Duration: 15 Aug 2024 → …

Publication series

NamePrivateNLP 2024 - 5th Workshop on Privacy in Natural Language Processing, Proceedings of the Workshop

Conference

Conference5th Workshop on Privacy in Natural Language Processing, PrivateNLP 2024 - Co-located with ACL 2024
Country/TerritoryThailand
CityBangkok
Period15/08/24 → …

Fingerprint

Dive into the research topics of 'A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy'. Together they form a unique fingerprint.

Cite this