Source author record

Ohad Elishco

Ohad Elishco appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.PR

Catalog footprint

What is connected

7works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

On the Long-Term behavior of $k$-tuples Frequencies in Mutation Systems

In response to the evolving landscape of data storage, researchers have increasingly explored non-traditional platforms, with DNA-based storage emerging as a cutting-edge solution. Our work is motivated by the potential of in-vivo DNA storage, known for its capacity to store vast amounts of information efficiently and confidentially within an organism's native DNA. While promising, in-vivo DNA storage faces challenges, including susceptibility to errors introduced by mutations. To understand the long-term behavior of such mutation systems, we investigate the frequency of $k$-tuples after multiple mutation applications. Drawing inspiration from related works, we generalize results from the study of mutation systems, particularly focusing on the frequency of $k$-tuples. In this work, we provide a broad analysis through the construction of a specialized matrix and the identification of its eigenvectors. In the context of substitution and duplication systems, we leverage previous results on almost sure convergence, equating the expected frequency to the limiting frequency. Moreover, we demonstrate convergence in probability under certain assumptions.

preprint2022arXiv

Optimal Reference for DNA Synthesis

In the recent years, DNA has emerged as a potentially viable storage technology. DNA synthesis, which refers to the task of writing the data into DNA, is perhaps the most costly part of existing storage systems. Accordingly, this high cost and low throughput limits the practical use in available DNA synthesis technologies. It has been found that the homopolymer run (i.e., the repetition of the same nucleotide) is a major factor affecting the synthesis and sequencing errors. Quite recently, [26] studied the role of batch optimization in reducing the cost of large scale DNA synthesis, for a given pool $\mathcal{S}$ of random quaternary strings of fixed length. Among other things, it was shown that the asymptotic cost savings of batch optimization are significantly greater when the strings in $\mathcal{S}$ contain repeats of the same character (homopolymer run of length one), as compared to the case where strings are unconstrained. Following the lead of [26], in this paper, we take a step forward towards the theoretical understanding of DNA synthesis, and study the homopolymer run of length $k\geq1$. Specifically, we are given a set of DNA strands $\mathcal{S}$, randomly drawn from a natural Markovian distribution modeling a general homopolymer run length constraint, that we wish to synthesize. For this problem, we prove that for any $k\geq 1$, the optimal reference strand, minimizing the cost of DNA synthesis is, perhaps surprisingly, the periodic sequence $\overline{\mathsf{ACGT}}$. It turns out that tackling the homopolymer constraint of length $k\geq2$ is a challenging problem; our main technical contribution is the representation of the DNA synthesis process as a certain constrained system, for which string techniques can be applied.

preprint2022arXiv

Recoverable Systems

Motivated by the established notion of storage codes, we consider sets of infinite sequences over a finite alphabet such that every $k$-tuple of consecutive entries is uniquely recoverable from its $l$-neighborhood in the sequence. We address the problem of finding the maximum growth rate of the set, which we term capacity, as well as constructions of explicit families that approach the optimal rate. The techniques that we employ rely on the connection of this problem with constrained systems. In the second part of the paper we consider a modification of the problem wherein the entries in the sequence are viewed as random variables over a finite alphabet that follow some joint distribution, and the recovery condition requires that the Shannon entropy of the $k$-tuple conditioned on its $l$-neighborhood be bounded above by some $ε>0.$ We study properties of measures on infinite sequences that maximize the metric entropy under the recoverability condition. Drawing on tools from ergodic theory, we prove some properties of entropy-maximizing measures. We also suggest a procedure of constructing an $ε$-recoverable measure from a corresponding deterministic system.

preprint2020arXiv

Capacity of dynamical storage systems

We introduce a dynamical model of node repair in distributed storage systems wherein the storage nodes are subjected to failures according to independent Poisson processes. The main parameter that we study is the time-average capacity of the network in the scenario where a fixed subset of the nodes support a higher repair bandwidth than the other nodes. The sequence of node failures generates random permutations of the nodes in the encoded block, and we model the state of the network as a Markov random walk on permutations of $n$ elements. As our main result we show that the capacity of the network can be increased compared to the static (worst-case) model of the storage system, while maintaining the same (average) repair bandwidth, and we derive estimates of the increase. We also quantify the capacity increase in the case that the repair center has information about the sequence of the recently failed storage nodes.

preprint2016arXiv

Encoding Semiconstrained Systems

Semiconstrained systems were recently suggested as a generalization of constrained systems, commonly used in communication and data-storage applications that require certain offending subsequences be avoided. In an attempt to apply techniques from constrained systems, we study sequences of constrained systems that are contained in, or contain, a given semiconstrained system, while approaching its capacity. In the case of contained systems we describe to such sequences resulting in constant-to-constant bit-rate block encoders and sliding-block encoders. Surprisingly, in the case of containing systems we show that a "generic" semiconstrained system is never contained in a proper fully-constrained system.

preprint2015arXiv

Semi-constrained Systems

When transmitting information over a noisy channel, two approaches, dating back to Shannon's work, are common: assuming the channel errors are independent of the transmitted content and devising an error-correcting code, or assuming the errors are data dependent and devising a constrained-coding scheme that eliminates all offending data patterns. In this paper we analyze a middle road, which we call a semiconstrained system. In such a system, which is an extension of the channel with cost constraints model, we do not eliminate the error-causing sequences entirely, but rather restrict the frequency in which they appear. We address several key issues in this study. The first is proving closed-form bounds on the capacity which allow us to bound the asymptotics of the capacity. In particular, we bound the rate at which the capacity of the semiconstrained $(0,k)$-RLL tends to $1$ as $k$ grows. The second key issue is devising efficient encoding and decoding procedures that asymptotically achieve capacity with vanishing error. Finally, we consider delicate issues involving the continuity of the capacity and a relaxation of the definition of semiconstrained systems.

preprint2012arXiv

Capacity and coding for the Ising Channel with Feedback

The Ising channel, which was introduced in 1990, is a channel with memory that models Inter-Symbol interference. In this paper we consider the Ising channel with feedback and find the capacity of the channel together with a capacity-achieving coding scheme. To calculate the channel capacity, an equivalent dynamic programming (DP) problem is formulated and solved. Using the DP solution, we establish that the feedback capacity is the expression $C=(\frac{2H_b(a)}{3+a})\approx 0.575522$ where $a$ is a particular root of a fourth-degree polynomial and $H_b(x)$ denotes the binary entropy function. Simultaneously, $a=\arg \max_{0\leq x \leq 1} (\frac{2H_b(x)}{3+x})$. Finally, a simple, error-free, capacity-achieving coding scheme is provided together with outlining a strong connection between the DP results and the coding scheme.