TY - JOUR
T1 - Insertion and Deletion Correction in Polymer-Based Data Storage
AU - Banerjee, Anisha
AU - Wachter-Zeh, Antonia
AU - Yaakobi, Eitan
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/7/1
Y1 - 2023/7/1
N2 - Synthetic polymer-based data storage seems to be a particularly promising candidate that could help to cope with the ever-increasing demand for archival storage requirements. It involves designing molecules of distinct masses to represent the respective bits {0, 1}, followed by the synthesis of a polymer of molecular units that reflects the order of bits in the information string. Reading out the stored data requires the use of a tandem mass spectrometer, that fragments the polymer into shorter substrings and provides their corresponding masses, from which the composition, i.e. the number of 1s and 0s in the concerned substring can be inferred. Prior works have dealt with the problem of unique string reconstruction from the set of all possible compositions, called composition multiset. This was accomplished either by determining which string lengths always allow unique reconstruction, or by formulating coding constraints to facilitate the same for all string lengths. Additionally, error-correcting schemes to deal with substitution errors caused by imprecise fragmentation during the readout process, have also been suggested. This work builds on this research by extending previously considered error models, mainly confined to substitution of compositions. To this end, we define new error models that consider insertions of spurious compositions and deletions of existing ones, thereby corrupting the composition multiset. We analyze if the reconstruction codebook proposed by Pattabiraman et al. is indeed robust to such errors, and if not, propose new coding constraints to remedy this.
AB - Synthetic polymer-based data storage seems to be a particularly promising candidate that could help to cope with the ever-increasing demand for archival storage requirements. It involves designing molecules of distinct masses to represent the respective bits {0, 1}, followed by the synthesis of a polymer of molecular units that reflects the order of bits in the information string. Reading out the stored data requires the use of a tandem mass spectrometer, that fragments the polymer into shorter substrings and provides their corresponding masses, from which the composition, i.e. the number of 1s and 0s in the concerned substring can be inferred. Prior works have dealt with the problem of unique string reconstruction from the set of all possible compositions, called composition multiset. This was accomplished either by determining which string lengths always allow unique reconstruction, or by formulating coding constraints to facilitate the same for all string lengths. Additionally, error-correcting schemes to deal with substitution errors caused by imprecise fragmentation during the readout process, have also been suggested. This work builds on this research by extending previously considered error models, mainly confined to substitution of compositions. To this end, we define new error models that consider insertions of spurious compositions and deletions of existing ones, thereby corrupting the composition multiset. We analyze if the reconstruction codebook proposed by Pattabiraman et al. is indeed robust to such errors, and if not, propose new coding constraints to remedy this.
KW - Polymer-based data storage
KW - composition errors
KW - deletions
KW - insertions
KW - string reconstruction
UR - http://www.scopus.com/inward/record.url?scp=85149400212&partnerID=8YFLogxK
U2 - 10.1109/TIT.2023.3252045
DO - 10.1109/TIT.2023.3252045
M3 - Article
AN - SCOPUS:85149400212
SN - 0018-9448
VL - 69
SP - 4384
EP - 4406
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 7
ER -