Abstract | ||
---|---|---|
In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$M$ </tex-math></inline-formula>
sequences, each of length
<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$L$ </tex-math></inline-formula>
. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal. |
Year | DOI | Venue |
---|---|---|
2020 | 10.1109/TIT.2019.2961265 | IEEE Transactions on Information Theory |
Keywords | Field | DocType |
Coding over sets,DNA data storage,Gilbert-Varshamov bound,insertion and deletion errors,sphere packing bound | Computer science,Coding (social sciences),DNA,Computational biology | Journal |
Volume | Issue | ISSN |
66 | 4 | 0018-9448 |
Citations | PageRank | References |
6 | 0.59 | 0 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Andreas Lenz | 1 | 21 | 5.73 |
Paul H. Siegel | 2 | 1142 | 105.90 |
Antonia Wachter-Zeh | 3 | 129 | 33.65 |
Eitan Yaakobi | 4 | 604 | 70.41 |