TY - GEN
T1 - Embracing errors is more effective than avoiding them through constrained coding for DNA data storage
AU - Weindel, Franziska
AU - Gimpel, Andreas L.
AU - Grass, Robert N.
AU - Heckel, Reinhard
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - DNA is an attractive medium for digital data storage. When data is stored on DNA, errors occur, which makes error-correcting coding techniques critical for reliable DNA data storage. To reduce the number of errors, a common technique is to include constraints that avoid homopolymers (consecutive repeated nucleotides) and balance the GC content, as sequences with homopolymers and unbalanced GC contents are often associated with larger error rates. However, constrained coding comes at the cost of an increase in redundancy. An alternative is to control the errors by randomizing the sequences, embracing the extra errors, and paying for them with additional coding redundancy. In this paper, we determine the error regimes in which embracing errors is more efficient than constrained coding. We find that constrained coding is inefficient in most common error regimes for DNA data storage. Specifically, the error probabilities for homopolymers and unbalanced GC contents must be very large for constrained coding to achieve a higher code rate than unconstrained coding.
AB - DNA is an attractive medium for digital data storage. When data is stored on DNA, errors occur, which makes error-correcting coding techniques critical for reliable DNA data storage. To reduce the number of errors, a common technique is to include constraints that avoid homopolymers (consecutive repeated nucleotides) and balance the GC content, as sequences with homopolymers and unbalanced GC contents are often associated with larger error rates. However, constrained coding comes at the cost of an increase in redundancy. An alternative is to control the errors by randomizing the sequences, embracing the extra errors, and paying for them with additional coding redundancy. In this paper, we determine the error regimes in which embracing errors is more efficient than constrained coding. We find that constrained coding is inefficient in most common error regimes for DNA data storage. Specifically, the error probabilities for homopolymers and unbalanced GC contents must be very large for constrained coding to achieve a higher code rate than unconstrained coding.
UR - http://www.scopus.com/inward/record.url?scp=85179522177&partnerID=8YFLogxK
U2 - 10.1109/Allerton58177.2023.10313494
DO - 10.1109/Allerton58177.2023.10313494
M3 - Conference contribution
AN - SCOPUS:85179522177
T3 - 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
BT - 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
Y2 - 26 September 2023 through 29 September 2023
ER -