DeepMemory: Model-based Memorization Analysis of Deep Neural Language Models

Derui Zhu, Jinfu Chen, Weiyi Shang, Xuebing Zhou, Jens Grossklags, Ahmed E. Hassan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

The neural network model is having a significant impact on many real-world applications. Unfortunately, the increasing popularity and complexity of these models also amplifies their security and privacy challenges, with privacy leakage from training data being one of the most prominent issues. In this context, prior studies proposed to analyze the abstraction behavior of neural network models, e.g., RNN, to understand their robustness. However, the existing research rarely addresses privacy breaches caused by memorization in neural language models. To fill this gap, we propose a novel approach, DeepMemory, that analyzes memorization behavior for a neural language model. We first construct a memorization-analysis-oriented model, taking both training data and a neural language model as input. We then build a semantic first-order Markov model to bind the constructed memorization-analysis-oriented model to the training data to analyze memorization distribution. Finally, we apply our approach to address data leakage issues associated with memorization and to assist in dememorization. We evaluate our approach on one of the most popular neural language models, the LSTM-based language model, with three public datasets, namely, WikiText-103, WMT2017, and IWSLT2016. We find that sentences in the studied datasets with low perplexity are more likely to be memorized. Our approach achieves an average AUC of 0.73 in automatically identifying data leakage issues during assessment. We also show that with the assistance of DeepMemory, data breaches due to memorization of neural language models can be successfully mitigated by mutating training data without reducing the performance of neural language models.

Original languageEnglish
Title of host publicationProceedings - 2021 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1003-1015
Number of pages13
ISBN (Electronic)9781665403375
DOIs
StatePublished - 2021
Event36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021 - Virtual, Online, Australia
Duration: 15 Nov 202119 Nov 2021

Publication series

NameProceedings - 2021 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021

Conference

Conference36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021
Country/TerritoryAustralia
CityVirtual, Online
Period15/11/2119/11/21

Keywords

  • Deep learning
  • Memorization
  • Model-based analysis
  • Neural language model
  • Privacy

Fingerprint

Dive into the research topics of 'DeepMemory: Model-based Memorization Analysis of Deep Neural Language Models'. Together they form a unique fingerprint.

Cite this