Graph explorer

Coded Shotgun Sequencing

Most DNA sequencing technologies are based on the shotgun paradigm: many short reads are obtained from random unknown locations in the DNA sequence. A fundamental question, studied in arXiv:1203.6233, is what read length and coverage depth (i.e., the total number of reads) are needed to guarantee reliable sequence reconstruction. Motivated by DNA-based storage, we study the coded version of this problem;i.e., the scenario where the DNA molecule being sequenced is a codeword from a predefined codebook. Our main result is an exact characterization of the capacity of the resulting shotgun sequencing channel as a function of the read length and coverage depth. In particular, our results imply that, while in the uncoded case, $O(n)$ reads of length greater than $2\log{n}$ are needed for reliable reconstruction of a length-$n$ binary sequence, in the coded case, only $O(n/\log{n})$ reads of length greater than $\log{n}$ are needed for the capacity to be arbitrarily close to $1$.

7 nodes6 linksoverview previewCoded Shotgun Sequencing
7 nodes6 links
Coded Shotgun Sequencing7 visible / 7 total nodes / 9 links
Co-authorshipCo-authorshipCo-authorshipAuthorshipAuthorshipAuthorshipTopic signalTopic signalTopic signalWCoded Shotgun Sequencingpreprint / 2022AAditya Narayan RaviResearcherAAlireza VahidResearcherAIlan ShomoronyResearcherTInformation Theory6710 worksTmath.IT6610 worksTApplications3567 works
PaperSignal 106 links

Coded Shotgun Sequencing

preprint / 2022

Open