TY - JOUR
T1 - Reading and writing digital data in DNA
AU - Meiser, Linda C.
AU - Antkowiak, Philipp L.
AU - Koch, Julian
AU - Chen, Weida D.
AU - Kohll, A. Xavier
AU - Stark, Wendelin J.
AU - Heckel, Reinhard
AU - Grass, Robert N.
N1 - Publisher Copyright:
© 2019, The Author(s), under exclusive licence to Springer Nature Limited.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Because of its longevity and enormous information density, DNA is considered a promising data storage medium. In this work, we provide instructions for archiving digital information in the form of DNA and for subsequently retrieving it from the DNA. In principle, information can be represented in DNA by simply mapping the digital information to DNA and synthesizing it. However, imperfections in synthesis, sequencing, storage and handling of the DNA induce errors within the molecules, making error-free information storage challenging. The procedure discussed here enables error-free storage by protecting the information using error-correcting codes. Specifically, in this protocol, we provide the technical details and precise instructions for translating digital information to DNA sequences, physically handling the biomolecules, storing them and subsequently re-obtaining the information by sequencing the DNA. Along with the protocol, we provide computer code that automatically encodes digital information to DNA sequences and decodes the information back from DNA to a digital file. The required software is provided on a Github repository. The protocol relies on commercial DNA synthesis and DNA sequencing via Illumina dye sequencing, and requires 1–2 h of preparation time, 1/2 d for sequencing preparation and 2–4 h for data analysis. This protocol focuses on storage scales of ~100 kB to 15 MB, offering an ideal starting point for small experiments. It can be augmented to enable higher data volumes and random access to the data and also allows for future sequencing and synthesis technologies, by changing the parameters of the encoder/decoder to account for the corresponding error rates.
AB - Because of its longevity and enormous information density, DNA is considered a promising data storage medium. In this work, we provide instructions for archiving digital information in the form of DNA and for subsequently retrieving it from the DNA. In principle, information can be represented in DNA by simply mapping the digital information to DNA and synthesizing it. However, imperfections in synthesis, sequencing, storage and handling of the DNA induce errors within the molecules, making error-free information storage challenging. The procedure discussed here enables error-free storage by protecting the information using error-correcting codes. Specifically, in this protocol, we provide the technical details and precise instructions for translating digital information to DNA sequences, physically handling the biomolecules, storing them and subsequently re-obtaining the information by sequencing the DNA. Along with the protocol, we provide computer code that automatically encodes digital information to DNA sequences and decodes the information back from DNA to a digital file. The required software is provided on a Github repository. The protocol relies on commercial DNA synthesis and DNA sequencing via Illumina dye sequencing, and requires 1–2 h of preparation time, 1/2 d for sequencing preparation and 2–4 h for data analysis. This protocol focuses on storage scales of ~100 kB to 15 MB, offering an ideal starting point for small experiments. It can be augmented to enable higher data volumes and random access to the data and also allows for future sequencing and synthesis technologies, by changing the parameters of the encoder/decoder to account for the corresponding error rates.
UR - http://www.scopus.com/inward/record.url?scp=85075964642&partnerID=8YFLogxK
U2 - 10.1038/s41596-019-0244-5
DO - 10.1038/s41596-019-0244-5
M3 - Article
C2 - 31784718
AN - SCOPUS:85075964642
SN - 1754-2189
VL - 15
SP - 86
EP - 101
JO - Nature Protocols
JF - Nature Protocols
IS - 1
ER -