DP-MLM: Differentially Private Text Rewriting Using Masked Language Models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The task of text privatization using Differential Privacy has recently taken the form of text rewriting, in which an input text is obfuscated via the use of generative (large) language models. While these methods have shown promising results in the ability to preserve privacy, these methods rely on autoregressive models which lack a mechanism to contextualize the private rewriting process. In response to this, we propose DP-MLM, a new method for differentially private text rewriting based on leveraging masked language models (MLMs) to rewrite text in a semantically similar and obfuscated manner. We accomplish this with a simple contextualization technique, whereby we rewrite a text one token at a time. We find that utilizing encoder-only MLMs provides better utility preservation at lower e levels, as compared to previous methods relying on larger models with a decoder. In addition, MLMs allow for greater customization of the rewriting mechanism, as opposed to generative approaches. We make the code for DP-MLM public and reusable, found at https://github.com/sjmeis/DPMLM.

Original languageEnglish
Title of host publication62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Proceedings of the Conference
EditorsLun-Wei Ku, Andre Martins, Vivek Srikumar
PublisherAssociation for Computational Linguistics (ACL)
Pages9314-9328
Number of pages15
ISBN (Electronic)9798891760998
StatePublished - 2024
EventFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Hybrid, Bangkok, Thailand
Duration: 11 Aug 202416 Aug 2024

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

ConferenceFindings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024
Country/TerritoryThailand
CityHybrid, Bangkok
Period11/08/2416/08/24

Fingerprint

Dive into the research topics of 'DP-MLM: Differentially Private Text Rewriting Using Masked Language Models'. Together they form a unique fingerprint.

Cite this