A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

Abstract

Applications of Differential Privacy (DP) in NLP must distinguish between the syntactic level on which a proposed mechanism operates, often taking the form of word-level or document-level privatization. Recently, several word-level Metric Differential Privacy approaches have been proposed, which rely on this generalized DP notion for operating in word embedding spaces. These approaches, however, often fail to produce semantically coherent textual outputs, and their application at the sentence- or document-level is only possible by a basic composition of word perturbations. In this work, we strive to address these challenges by operating between the word and sentence levels, namely with collocations. By perturbing n-grams rather than single words, we devise a method where composed privatized outputs have higher semantic coherence and variable length. This is accomplished by constructing an embedding model based on frequently occurring word groups, in which unigram words co-exist with bi- and trigram collocations. We evaluate our method in utility and privacy tests, which make a clear case for tokenization strategies beyond the word level.

OriginalspracheEnglisch
TitelPrivateNLP 2024 - 5th Workshop on Privacy in Natural Language Processing, Proceedings of the Workshop
Redakteure/-innenIvan Habernal, Sepideh Ghanavati, Abhilasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah, Oluwaseyi Feyisetan
Herausgeber (Verlag)Association for Computational Linguistics (ACL)
Seiten39-51
Seitenumfang13
ISBN (elektronisch)9798891761391
PublikationsstatusVeröffentlicht - 2024
Veranstaltung5th Workshop on Privacy in Natural Language Processing, PrivateNLP 2024 - Co-located with ACL 2024 - Bangkok, Thailand
Dauer: 15 Aug. 2024 → …

Publikationsreihe

NamePrivateNLP 2024 - 5th Workshop on Privacy in Natural Language Processing, Proceedings of the Workshop

Konferenz

Konferenz5th Workshop on Privacy in Natural Language Processing, PrivateNLP 2024 - Co-located with ACL 2024
Land/GebietThailand
OrtBangkok
Zeitraum15/08/24 → …

Fingerprint

Untersuchen Sie die Forschungsthemen von „A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren