Abstract
In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of M sequences, each of length L. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.
| Original language | English |
|---|---|
| Article number | 8937735 |
| Pages (from-to) | 2331-2351 |
| Number of pages | 21 |
| Journal | IEEE Transactions on Information Theory |
| Volume | 66 |
| Issue number | 4 |
| DOIs | |
| State | Published - Apr 2020 |
| Externally published | Yes |
Keywords
- Coding over sets
- DNA data storage
- Gilbert-Varshamov bound
- insertion and deletion errors
- sphere packing bound
Fingerprint
Dive into the research topics of 'Coding over Sets for DNA Storage'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver