TY - GEN
T1 - Insertion and Deletion Correction in Polymer-based Data Storage
AU - Banerjee, Anisha
AU - Wachter-Zeh, Antonia
AU - Yaakobi, Eitan
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Synthetic polymer-based storage promises to accommodate the ever-increasing demand for archival storage. It involves designing molecules of distinct masses to represent the respective bits {0, 1}, followed by the synthesis of a polymer of molecular units that reflects the order of bits in the information string. The stored data can be read by means of a tandem mass spectrometer, that fragments the polymer into shorter substrings and provides their corresponding masses, from which the composition, i.e., the number of 1s and 0s in the concerned substring can be inferred. Prior works tackled the problem of unique string reconstruction from the set of all possible compositions, called the composition multiset. This was accomplished either by determining which string lengths always allow unique reconstruction, or by formulating coding constraints to facilitate the same for all string lengths. Additionally, error-correcting schemes to deal with substitution errors caused by imprecise fragmentation during the readout process, have also been suggested. This work extends previously considered error models that were mainly confined to substitutions of compositions. Our new error models consider insertions and deletions of compositions. The robustness of the reconstruction codebook proposed by Pattabiraman et al. to such errors is examined, and whenever necessary, new coding constraints are proposed to ensure unique reconstruction.
AB - Synthetic polymer-based storage promises to accommodate the ever-increasing demand for archival storage. It involves designing molecules of distinct masses to represent the respective bits {0, 1}, followed by the synthesis of a polymer of molecular units that reflects the order of bits in the information string. The stored data can be read by means of a tandem mass spectrometer, that fragments the polymer into shorter substrings and provides their corresponding masses, from which the composition, i.e., the number of 1s and 0s in the concerned substring can be inferred. Prior works tackled the problem of unique string reconstruction from the set of all possible compositions, called the composition multiset. This was accomplished either by determining which string lengths always allow unique reconstruction, or by formulating coding constraints to facilitate the same for all string lengths. Additionally, error-correcting schemes to deal with substitution errors caused by imprecise fragmentation during the readout process, have also been suggested. This work extends previously considered error models that were mainly confined to substitutions of compositions. Our new error models consider insertions and deletions of compositions. The robustness of the reconstruction codebook proposed by Pattabiraman et al. to such errors is examined, and whenever necessary, new coding constraints are proposed to ensure unique reconstruction.
UR - http://www.scopus.com/inward/record.url?scp=85136305276&partnerID=8YFLogxK
U2 - 10.1109/ISIT50566.2022.9834352
DO - 10.1109/ISIT50566.2022.9834352
M3 - Conference contribution
AN - SCOPUS:85136305276
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 802
EP - 807
BT - 2022 IEEE International Symposium on Information Theory, ISIT 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Symposium on Information Theory, ISIT 2022
Y2 - 26 June 2022 through 1 July 2022
ER -