Title
An Upper Bound on the Capacity of the DNA Storage Channel
Abstract
Paved by recent advances in sequencing and synthesis technologies, DNA has evolved to a competitive medium for long-term data storage. In this paper we conduct an information theoretic study of the storage channel-the entity that formulates the relation between stored and sequenced strands. In particular, we derive an upper bound on the Shannon capacity of the channel. In our channel model, we incorporate the main attributes that characterize DNA-based data storage. That is, information is synthesized on many short DNA strands, and each strand is copied many times. Due to the storage and sequencing methods, the receiver draws strands from the original sequences in an uncontrollable manner, where it is possible that copies of the same sequence are drawn multiple times. Additionally, due to imperfections, the obtained strands can be perturbed by errors. We show that for a large range of parameters, the channel decomposes into sub-channels from each input sequence to multiple output sequences, so-called clusters. The cluster sizes hereby follow a Poisson distribution. Furthermore, the ordering of sub-channels is unknown to the receiver. Our results can be used to guide future experiments for DNA-based data storage by giving an upper bound on the achievable rate of any error-correcting code. We further give a detailed discussion and intuitive interpretation of the channel that provide insights about the nature of the channel and can inspire new ideas for error-correcting codes and decoding methods.
Year
DOI
Venue
2019
10.1109/ITW44776.2019.8989388
2019 IEEE Information Theory Workshop (ITW)
Keywords
Field
DocType
upper bound,DNA storage channel,synthesis technologies,long-term data storage,information theoretic study,storage channel,Shannon capacity,channel model,DNA-based data storage,short DNA strands,sequencing methods,multiple output sequences,error-correcting codes,decoding methods,Poisson distribution
Cluster (physics),Dna storage,Computer science,Upper and lower bounds,Computer data storage,Algorithm,Communication channel,Theoretical computer science,Poisson distribution,Decoding methods,Channel capacity
Conference
ISSN
ISBN
Citations 
2475-420X
978-1-5386-6901-3
1
PageRank 
References 
Authors
0.36
0
4
Name
Order
Citations
PageRank
Andreas Lenz1215.73
Paul H. Siegel21142105.90
Antonia Wachter-Zeh312933.65
Eitan Yaakobi460470.41