Abstract
DNA is an attractive medium for digital data storage. When data is stored on DNA, errors occur, making error-correcting codes critical for reliable storage. A common approach to reduce errors is constrained coding, which avoids homopolymers (consecutive repeated nucleotides) and balances GC content, as they are associated with higher error rates. However, constrained coding comes at the cost of an increase in redundancy. An alternative is to randomize DNA sequences, embrace errors, and compensate with additional coding redundancy. In this paper, we identify the error regimes in which embracing substitution errors is more efficient than constrained coding. Our results indicate that constrained coding for substitution errors can be inefficient in current DNA data storage systems. Theoretical analysis shows that constrained coding would be efficient only under high error rates in homopolymers and GC-imbalanced sequences, while empirical data show that error-rate increases for these nucleotides are minimal in current systems.
| Original language | English |
|---|---|
| Pages (from-to) | 146-156 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Molecular, Biological, and Multi-Scale Communications |
| Volume | 12 |
| DOIs | |
| State | Published - 2026 |
Keywords
- DNA data storage
- Gilbert-Varshamov bounds
- achievable code rates
- constrained coding
- random coding
Fingerprint
Dive into the research topics of 'Embracing Errors Can Be More Efficient Than Avoiding Them Through Constrained Coding for DNA Data Storage'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver