Scanning electronic documents for personally identifiable information

Tuomas Aura, Thomas A. Kuhn, Michael Roe

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

28 Scopus citations

Abstract

Sometimes, it is necessary to remove author names and other personally identifiable information (PII) from documents before publication. We have implemented a novel defensive tool for detecting such data automatically. By using the detection tool, we have learned about where PII may be stored in documents and how it is put there. A key observation is that, contrary to common belief, user and machine identifiers and other metadata are not embedded in documents only by a single piece of software, such as a word processor, but by various tools used at different stages of the document authoring process.

Original languageEnglish
Title of host publicationProceedings of the 5th ACM Workshop on Privacy in Electronic Society, WPES 2006, Co-located with the 13th ACM Conference on Computer and Communications Security, CCS 2006
Pages41-49
Number of pages9
DOIs
StatePublished - 2006
Event5th ACM Workshop on Privacy in Electronic Society, WPES 2006, Co-located with the 13th ACM Conference on Computer and Communications Security, CCS 2006 - Alexandria, VA, United States
Duration: 30 Oct 200630 Oct 2006

Publication series

NameProceedings of the ACM Conference on Computer and Communications Security
ISSN (Print)1543-7221

Conference

Conference5th ACM Workshop on Privacy in Electronic Society, WPES 2006, Co-located with the 13th ACM Conference on Computer and Communications Security, CCS 2006
Country/TerritoryUnited States
CityAlexandria, VA
Period30/10/0630/10/06

Keywords

  • metadata
  • personally identifiable information
  • privacy

Fingerprint

Dive into the research topics of 'Scanning electronic documents for personally identifiable information'. Together they form a unique fingerprint.

Cite this