Skip to main navigation Skip to search Skip to main content

Embracing Errors Can Be More Efficient Than Avoiding Them Through Constrained Coding for DNA Data Storage

  • Technical University of Munich
  • ETH Zurich

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

DNA is an attractive medium for digital data storage. When data is stored on DNA, errors occur, making error-correcting codes critical for reliable storage. A common approach to reduce errors is constrained coding, which avoids homopolymers (consecutive repeated nucleotides) and balances GC content, as they are associated with higher error rates. However, constrained coding comes at the cost of an increase in redundancy. An alternative is to randomize DNA sequences, embrace errors, and compensate with additional coding redundancy. In this paper, we identify the error regimes in which embracing substitution errors is more efficient than constrained coding. Our results indicate that constrained coding for substitution errors can be inefficient in current DNA data storage systems. Theoretical analysis shows that constrained coding would be efficient only under high error rates in homopolymers and GC-imbalanced sequences, while empirical data show that error-rate increases for these nucleotides are minimal in current systems.

Original languageEnglish
Pages (from-to)146-156
Number of pages11
JournalIEEE Transactions on Molecular, Biological, and Multi-Scale Communications
Volume12
DOIs
StatePublished - 2026

Keywords

  • DNA data storage
  • Gilbert-Varshamov bounds
  • achievable code rates
  • constrained coding
  • random coding

Fingerprint

Dive into the research topics of 'Embracing Errors Can Be More Efficient Than Avoiding Them Through Constrained Coding for DNA Data Storage'. Together they form a unique fingerprint.

Cite this