DeepMemory: Model-based Memorization Analysis of Deep Neural Language Models

Derui Zhu, Jinfu Chen, Weiyi Shang, Xuebing Zhou, Jens Grossklags, Ahmed E. Hassan

Publikation: Beitrag in Buch/Bericht/KonferenzbandKonferenzbeitragBegutachtung

5 Zitate (Scopus)

Abstract

The neural network model is having a significant impact on many real-world applications. Unfortunately, the increasing popularity and complexity of these models also amplifies their security and privacy challenges, with privacy leakage from training data being one of the most prominent issues. In this context, prior studies proposed to analyze the abstraction behavior of neural network models, e.g., RNN, to understand their robustness. However, the existing research rarely addresses privacy breaches caused by memorization in neural language models. To fill this gap, we propose a novel approach, DeepMemory, that analyzes memorization behavior for a neural language model. We first construct a memorization-analysis-oriented model, taking both training data and a neural language model as input. We then build a semantic first-order Markov model to bind the constructed memorization-analysis-oriented model to the training data to analyze memorization distribution. Finally, we apply our approach to address data leakage issues associated with memorization and to assist in dememorization. We evaluate our approach on one of the most popular neural language models, the LSTM-based language model, with three public datasets, namely, WikiText-103, WMT2017, and IWSLT2016. We find that sentences in the studied datasets with low perplexity are more likely to be memorized. Our approach achieves an average AUC of 0.73 in automatically identifying data leakage issues during assessment. We also show that with the assistance of DeepMemory, data breaches due to memorization of neural language models can be successfully mitigated by mutating training data without reducing the performance of neural language models.

OriginalspracheEnglisch
TitelProceedings - 2021 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
Seiten1003-1015
Seitenumfang13
ISBN (elektronisch)9781665403375
DOIs
PublikationsstatusVeröffentlicht - 2021
Veranstaltung36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021 - Virtual, Online, Australien
Dauer: 15 Nov. 202119 Nov. 2021

Publikationsreihe

NameProceedings - 2021 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021

Konferenz

Konferenz36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021
Land/GebietAustralien
OrtVirtual, Online
Zeitraum15/11/2119/11/21

Fingerprint

Untersuchen Sie die Forschungsthemen von „DeepMemory: Model-based Memorization Analysis of Deep Neural Language Models“. Zusammen bilden sie einen einzigartigen Fingerprint.

Dieses zitieren