Coding over Sets for DNA Storage

Andreas Lenz, Paul H. Siegel, Antonia Wachter-Zeh, Eitan Yaakobi

Research output: Contribution to journalArticlepeer-review

78 Scopus citations

Abstract

In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of M sequences, each of length L. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.

Original languageEnglish
Article number8937735
Pages (from-to)2331-2351
Number of pages21
JournalIEEE Transactions on Information Theory
Volume66
Issue number4
DOIs
StatePublished - Apr 2020
Externally publishedYes

Keywords

  • Coding over sets
  • DNA data storage
  • Gilbert-Varshamov bound
  • insertion and deletion errors
  • sphere packing bound

Fingerprint

Dive into the research topics of 'Coding over Sets for DNA Storage'. Together they form a unique fingerprint.

Cite this