Source author record

Eitan Yaakobi

Eitan Yaakobi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.CO Networking and Internet Architecture Computational Complexity Data Structures and Algorithms Discrete Mathematics

Catalog footprint

What is connected

46works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Analyzing Collection Strategies: A Computational Perspective on the Coupon Collector Problem

The Coupon Collector Problem (CCP) is a well-known combinatorial problem that seeks to estimate the number of random draws required to complete a collection of $n$ distinct coupon types. Various generalizations of this problem have been applied in numerous engineering domains. However, practical applications are often hindered by the computational challenges associated with deriving numerical results for moments and distributions. In this work, we present three algorithms for solving the most general form of the CCP, where coupons are collected under any arbitrary drawing probability, with the objective of obtaining $t$ copies of a subset of $k$ coupons from a total of $n$. The First algorithm provides the base model to compute the expectation, variance, and the second moment of the collection process. The second algorithm utilizes the construction of the base model and computes the same values in polynomial time with respect to $n$ under the uniform drawing distribution, and the third algorithm extends to any general drawing distribution. All algorithms leverage Markov models specifically designed to address computational challenges, ensuring exact computation of the expectation and variance of the collection process. Their implementation uses a dynamic programming approach that follows from the Markov models framework, and their time complexity is analyzed accordingly.

preprint2026arXiv

Efficient Synthesis for Two-Dimensional Strand Arrays with Row Constraints

We study the theoretical problem of synthesizing multiple DNA strands under spatial constraints, motivated by large-scale DNA synthesis technologies. In this setting, strands are arranged in an array and synthesized according to a fixed global synthesis sequence, with the restriction that at most one strand per row may be synthesized in any synthesis cycle. We focus on the basic case of two strands in a single row and analyze the expected completion time under this row-constrained model. By decomposing the process into a Markov chain, we derive analytical upper and lower bounds on the expected synthesis time. We show that a simple laggard-first policy achieves an asymptotic expected completion time of (q+3)L/2 for any alphabet of size q, and that no online policy without look-ahead can asymptotically outperform this bound. For the binary case, we show that allowing a single-symbol look-ahead strictly improves performance, yielding an asymptotic expected completion time of 7L/3. Finally, we present a dynamic programming algorithm that computes the optimal offline schedule for any fixed pair of sequences. Together, these results provide the first analytical bounds for synthesis under spatial constraints and lay the groundwork for future studies of optimal synthesis policies in such settings.

preprint2026arXiv

Random Access in DNA Storage: Algorithms, Constructions, and Bounds

As DNA data storage moves closer to practical deployment, minimizing sequencing coverage depth is essential to reduce both operational costs and retrieval latency. This paper addresses the recently studied Random Access Problem, which evaluates the expected number of read samples required to recover a specific information strand from $n$ encoded strands. We propose a novel algorithm to compute the exact expected number of reads, achieving a computational complexity of $O(n)$ for fixed field size $q$ and information length $k$. Furthermore, we derive explicit formulas for the average and maximum expected number of reads, enabling an efficient search for optimal generator matrices under small parameters. Beyond theoretical analysis, we present new code constructions that improve the best-known upper bound from $0.8815k$ to $0.8811k$ for $k=3$, and achieve an upper bound of $0.8629k$ for $k=4$ for sufficiently large $q$. We also establish a tighter theoretical lower bound on the expected number of reads that improves upon state-of-the-art bounds. In particular, this bound establishes the optimality of the simple parity code for the case of $n=k+1$ across any alphabet $q$.

preprint2026arXiv

Reconstructing Reed-Solomon Codes from Multiple Noisy Channel Outputs

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication setting in which a sender transmits a codeword and the receiver observes K independent noisy versions of this codeword. In this work, we study the problem of efficient reconstruction when each of the $K$ outputs is corrupted by a $q$-ary discrete memoryless symmetric (DMS) substitution channel with substitution probability $p$. Focusing on Reed-Solomon (RS) codes, we adapt the Koetter-Vardy soft-decision decoding algorithm to obtain an efficient reconstruction algorithm. For sufficiently large blocklength and alphabet size, we derive an explicit rate threshold, depending only on $(p, K)$, such that the transmitted codeword can be reconstructed with arbitrarily small probability of error whenever the code rate $R$ lies below this threshold.

preprint2024arXiv

Byzantine-Resilient Gradient Coding through Local Gradient Computations

We consider gradient coding in the presence of an adversary controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the responses from malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we propose a way to reduce the replication to $s+1$ instead of $2s+1$ in the presence of $s$ malicious workers. Our method detects erroneous inputs from the malicious workers, transforming them into erasures. This comes at the expense of $s$ additional local computations at the main node and additional rounds of light communication between the main node and the workers. We define a general framework and give fundamental limits for fractional repetition data allocations. Our scheme is optimal in terms of replication and local computation and incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound. We furthermore show how additional redundancy can be exploited to reduce the number of local computations and communication cost, or, alternatively, tolerate straggling workers.

preprint2022arXiv

Codes for Constrained Periodicity

Reliability is an inherent challenge for the emerging nonvolatile technology of racetrack memories, and there exists a fundamental relationship between codes designed for racetrack memories and codes with constrained periodicity. Previous works have sought to construct codes that avoid periodicity in windows, yet have either only provided existence proofs or required high redundancy. This paper provides the first constructions for avoiding periodicity that are both efficient (average-linear time) and with low redundancy (near the lower bound). The proposed algorithms are based on iteratively repairing windows which contain periodicity until all the windows are valid. Intuitively, such algorithms should not converge as there is no monotonic progression; yet, we prove convergence with average-linear time complexity by exploiting subtle properties of the encoder. Overall, we both provide constructions that avoid periodicity in all windows, and we also study the cardinality of such constraints.

preprint2022arXiv

Covering Sequences for $\ell$-Tuples

de Bruijn sequences of order $\ell$, i.e., sequences that contain each $\ell$-tuple as a window exactly once, have found many diverse applications in information theory and most recently in DNA storage. This family of binary sequences has rate of $1/2$. To overcome this low rate, we study $\ell$-tuples covering sequences, which impose that each $\ell$-tuple appears at least once as a window in the sequence. The cardinality of this family of sequences is analyzed while assuming that $\ell$ is a function of the sequence length $n$. Lower and upper bounds on the asymptotic rate of this family are given. Moreover, we study an upper bound for $\ell$ such that the redundancy of the set of $\ell$-tuples covering sequences is at most a single symbol. Lastly, we present efficient encoding and decoding schemes for $\ell$-tuples covering sequences that meet this bound.

preprint2022arXiv

Equivalence of Insertion/Deletion Correcting Codes for $d$-dimensional Arrays

We consider the problem of correcting insertion and deletion errors in the $d$-dimensional space. This problem is well understood for vectors (one-dimensional space) and was recently studied for arrays (two-dimensional space). For vectors and arrays, the problem is motivated by several practical applications such as DNA-based storage and racetrack memories. From a theoretical perspective, it is interesting to know whether the same properties of insertion/deletion correcting codes generalize to the $d$-dimensional space. In this work, we show that the equivalence between insertion and deletion correcting codes generalizes to the $d$-dimensional space. As a particular result, we show the following missing equivalence for arrays: a code that can correct $t_\mathrm{r}$ and $t_\mathrm{c}$ row/column deletions can correct any combination of $t_\mathrm{r}^{\mathrm{ins}}+t_\mathrm{r}^{\mathrm{del}}=t_\mathrm{r}$ and $t_\mathrm{c}^{\mathrm{ins}}+t_\mathrm{c}^{\mathrm{del}}=t_\mathrm{c}$ row/column insertions and deletions. The fundamental limit on the redundancy and a construction of insertion/deletion correcting codes in the $d$-dimensional space remain open for future work.

preprint2022arXiv

On the Size of Balls and Anticodes of Small Diameter under the Fixed-Length Levenshtein Metric

The rapid development of DNA storage has brought the deletion and insertion channel to the front line of research. When the number of deletions is equal to the number of insertions, the Fixed Length Levenshtein (FLL) metric is the right measure for the distance between two words of the same length. Similar to any other metric, the size of a ball is one of the most fundamental parameters. In this work, we consider the minimum, maximum, and average size of a ball with radius one, in the FLL metric. The related minimum and the maximum size of a maximal anticode with diameter one are also considered.

preprint2022arXiv

Reconstruction from Substrings with Partial Overlap

This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are read with no overlap, this work considers the setup in which consecutive substrings are read with some given minimum overlap. First, upper bounds are provided on the attainable rates of codes that guarantee unique reconstruction. Then, we present efficient constructions of asymptotically optimal codes that meet the upper bound.

preprint2022arXiv

The Input and Output Entropies of the $k$-Deletion/Insertion Channel

The channel output entropy of a transmitted word is the entropy of the possible channel outputs and similarly, the input entropy of a received word is the entropy of all possible transmitted words. The goal of this work is to study these entropy values for the k-deletion, k-insertion channel, where exactly k symbols are deleted, and inserted in the transmitted word, respectively. If all possible words are transmitted with the same probability then studying the input and output entropies is equivalent. For both the 1-insertion and 1-deletion channels, it is proved that among all words with a fixed number of runs, the input entropy is minimized for words with a skewed distribution of their run lengths and it is maximized for words with a balanced distribution of their run lengths. Among our results, we establish a conjecture by Atashpendar et al. which claims that for the binary 1-deletion, the input entropy is maximized for the alternating words. This conjecture is also verified for the 2-deletion channel, where it is proved that constant words with a single run minimize the input entropy.

preprint2021arXiv

Codes over Trees

In graph theory, a tree is one of the more popular families of graphs with a wide range of applications in computer science as well as many other related fields. While there are several distance measures over the set of all trees, we consider here the one which defines the so-called tree distance, defined by the minimum number of edit operations, of removing and adding edges, in order to change one tree into another. From a coding theoretic perspective, codes over the tree distance are used for the correction of edge erasures and errors. However, studying this distance measure is important for many other applications that use trees and properties on their locality and the number of neighbor trees. Under this paradigm, the largest size of code over trees with a prescribed minimum tree distance is investigated. Upper bounds on these codes as well as code constructions are presented. A significant part of our study is dedicated to the problem of calculating the size of the ball of trees of a given radius. These balls are not regular and thus we show that while the star tree has asymptotically the smallest size of the ball, the maximum is achieved for the path tree.

preprint2021arXiv

Correctable Erasure Patterns in Product Topologies

Locality enables storage systems to recover failed nodes from small subsets of surviving nodes. The setting where nodes are partitioned into subsets, each allowing for local recovery, is well understood. In this work we consider a generalization introduced by Gopalan et al., where, viewing the codewords as arrays, constraints are imposed on the columns and rows in addition to some global constraints. Specifically, we present a generic method of adding such global parity-checks and derive new results on the set of correctable erasure patterns. Finally, we relate the set of correctable erasure patterns in the considered topology to those correctable in tensor-product codes.

preprint2021arXiv

Multi-strand Reconstruction from Substrings

The problem of string reconstruction based on its substrings spectrum has received significant attention recently due to its applicability to DNA data storage and sequencing. In contrast to previous works, we consider in this paper a setup of this problem where multiple strings are reconstructed together. Given a multiset $S$ of strings, all their substrings of some fixed length $\ell$, defined as the $\ell$-profile of $S$, are received and the goal is to reconstruct all strings in $S$. A multi-strand $\ell$-reconstruction code is a set of multisets such that every element $S$ can be reconstructed from its $\ell$-profile. Given the number of strings~$k$ and their length~$n$, we first find a lower bound on the value of $\ell$ necessary for existence of multi-strand $\ell$-reconstruction codes with non-vanishing asymptotic rate. We then present two constructions of such codes and show that their rates approach~$1$ for values of $\ell$ that asymptotically behave like the lower bound.

preprint2021arXiv

The Capacity of Single-Server Weakly-Private Information Retrieval

A private information retrieval (PIR) protocol guarantees that a user can privately retrieve files stored in a database without revealing any information about the identity of the requested file. Existing information-theoretic PIR protocols ensure perfect privacy, i.e., zero information leakage to the servers storing the database, but at the cost of high download. In this work, we present weakly-private information retrieval (WPIR) schemes that trade off perfect privacy to improve the download cost when the database is stored on a single server. We study the tradeoff between the download cost and information leakage in terms of mutual information (MI) and maximal leakage (MaxL) privacy metrics. By relating the WPIR problem to rate-distortion theory, the download-leakage function, which is defined as the minimum required download cost of all single-server WPIR schemes for a given level of information leakage and a fixed file size, is introduced. By characterizing the download-leakage function for the MI and MaxL metrics, the capacity of single-server WPIR is fully described.

preprint2021arXiv

The Zero Cubes Free and Cubes Unique Multidimensional Constraints

This paper studies two families of constraints for two-dimensional and multidimensional arrays. The first family requires that a multidimensional array will not contain a cube of zeros of some fixed size and the second constraint imposes that there will not be two identical cubes of a given size in the array. These constraints are natural extensions of their one-dimensional counterpart that have been rigorously studied recently. For both of these constraint we present conditions of the size of the cube for which the asymptotic rate of the set of valid arrays approaches 1 as well as conditions for the redundancy to be at most a single symbol. For the first family we present an efficient encoding algorithm that uses a single symbol to encode arbitrary information into a valid array and for the second family we present a similar encoder for the two-dimensional case. The results in the paper are also extended to similar constraints where the sub-array is not necessarily a cube, but a box of arbitrary dimensions and only its volume is bounded.

preprint2020arXiv

Array Codes for Functional PIR and Batch Codes

A functional PIR array code is a coding scheme which encodes some $s$ information bits into a $t\times m$ array such that every linear combination of the $s$ information bits has $k$ mutually disjoint recovering sets. Every recovering set consists of some of the array's columns while it is allowed to read at most $\ell$ encoded bits from every column in order to receive the requested linear combination of the information bits. Functional batch array codes impose a stronger property where every multiset request of $k$ linear combinations has $k$ mutually disjoint recovering sets. Locality functional array codes demand that the size of every recovering set is restrained to be at most $r$. Given the values of $s, k, t, \ell,r$, the goal of this paper is to study the optimal value of the number of columns $m$ such that these codes exist. Several lower bounds are presented as well as explicit constructions for several of these parameters.

preprint2020arXiv

Coding for Sequence Reconstruction for Single Edits

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. The common setup assumes the codebook to be the entire space and the problem is to determine the minimum number of distinct reads that is required to reconstruct the transmitted codeword. Motivated by modern storage devices, we study a variant of the problem where the number of noisy reads $N$ is fixed. Specifically, we design reconstruction codes that reconstruct a codeword from $N$ distinct noisy reads. We focus on channels that introduce single edit error (i.e. a single substitution, insertion, or deletion) and their variants, and design reconstruction codes for all values of $N$. In particular, for the case of a single edit, we show that as the number of noisy reads increases, the number of redundant bits required can be gracefully reduced from $\log n+O(1)$ to $\log \log n+O(1)$, and then to $O(1)$, where $n$ denotes the length of a codeword. We also show that the redundancy of certain reconstruction codes is within one bit of optimality.

preprint2020arXiv

Coding over Sets for DNA Storage

In this paper we study error-correcting codes for the storage of data in synthetic deoxyribonucleic acid (DNA). We investigate a storage model where a data set is represented by an unordered set of $M$ sequences, each of length $L$. Errors within that model are a loss of whole sequences and point errors inside the sequences, such as insertions, deletions and substitutions. We derive Gilbert-Varshamov lower bounds and sphere packing upper bounds on achievable cardinalities of error-correcting codes within this storage model. We further propose explicit code constructions than can correct errors in such a storage system that can be encoded and decoded efficiently. Comparing the sizes of these codes to the upper bounds, we show that many of the constructions are close to optimal.

preprint2020arXiv

Constrained de Bruijn Codes: Properties, Enumeration, Constructions, and Applications

The de Bruijn graph, its sequences, and their various generalizations, have found many applications in information theory, including many new ones in the last decade. In this paper, motivated by a coding problem for emerging memory technologies, a set of sequences which generalize sequences in the de Bruijn graph are defined. These sequences can be also defined and viewed as constrained sequences. Hence, they will be called constrained de Bruijn sequences and a set of such sequences will be called a constrained de Bruijn code. Several properties and alternative definitions for such codes are examined and they are analyzed as generalized sequences in the de Bruijn graph (and its generalization) and as constrained sequences. Various enumeration techniques are used to compute the total number of sequences for any given set of parameters. A construction method of such codes from the theory of shift-register sequences is proposed. Finally, we show how these constrained de Bruijn sequences and codes can be applied in constructions of codes for correcting synchronization errors in the $\ell$-symbol read channel and in the racetrack memory channel. For this purpose, these codes are superior in their size on previously known codes.

preprint2020arXiv

Covering Codes using Insertions or Deletions

A covering code is a set of codewords with the property that the union of balls, suitably defined, around these codewords covers an entire space. Generally, the goal is to find the covering code with the minimum size codebook. While most prior work on covering codes has focused on the Hamming metric, we consider the problem of designing covering codes defined in terms of either insertions or deletions. First, we provide new sphere-covering lower bounds on the minimum possible size of such codes. Then, we provide new existential upper bounds on the size of optimal covering codes for a single insertion or a single deletion that are tight up to a constant factor. Finally, we derive improved upper bounds for covering codes using $R\geq 2$ insertions or deletions. We prove that codes exist with density that is only a factor $O(R \log R)$ larger than the lower bounds for all fixed~$R$. In particular, our upper bounds have an optimal dependence on the word length, and we achieve asymptotic density matching the best known bounds for Hamming distance covering codes.

preprint2020arXiv

Optimal Reconstruction Codes for Deletion Channels

The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a communication scenario where the sender transmits a codeword from some codebook and the receiver obtains multiple noisy reads of the codeword. Motivated by modern storage devices, we introduced a variant of the problem where the number of noisy reads $N$ is fixed (Kiah et al. 2020). Of significance, for the single-deletion channel, using $\log_2\log_2 n +O(1)$ redundant bits, we designed a reconstruction code of length $n$ that reconstructs codewords from two distinct noisy reads. In this work, we show that $\log_2\log_2 n -O(1)$ redundant bits are necessary for such reconstruction codes, thereby, demonstrating the optimality of our previous construction. Furthermore, we show that these reconstruction codes can be used in $t$-deletion channels (with $t\ge 2$) to uniquely reconstruct codewords from $n^{t-1}+O\left(n^{t-2}\right)$ distinct noisy reads.

preprint2020arXiv

Partial MDS Codes with Local Regeneration

Partial MDS (PMDS) and sector-disk (SD) codes are classes of erasure codes that combine locality with strong erasure correction capabilities. We construct PMDS and SD codes where each local code is a bandwidth-optimal regenerating MDS code. The constructions require significantly smaller field size than the only other construction known in literature.

preprint2020arXiv

PIR Codes with Short Block Length

In this work private information retrieval (PIR) codes are studied. In a $k$-PIR code, $s$ information bits are encoded in such a way that every information bit has $k$ mutually disjoint recovery sets. The main problem under this paradigm is to minimize the number of encoded bits given the values of $s$ and $k$, where this value is denoted by $P(s,k)$. The main focus of this work is to analyze $P(s,k)$ for a large range of parameters of $s$ and $k$. In particular, we improve upon several of the existing results on this value.

preprint2020arXiv

Single-Deletion Single-Substitution Correcting Codes

Correcting insertions/deletions as well as substitution errors simultaneously plays an important role in DNA-based storage systems as well as in classical communications. This paper deals with the fundamental task of constructing codes that can correct a single insertion or deletion along with a single substitution. A non-asymptotic upper bound on the size of single-deletion single-substitution correcting codes is derived, showing that the redundancy of such a code of length $n$ has to be at least $2 \log n$. The bound is presented both for binary and non-binary codes while an extension to single deletion and multiple substitutions is presented for binary codes. An explicit construction of single-deletion single-substitution correcting codes with at most $6 \log n + 8$ redundancy bits is derived. Note that the best known construction for this problem has to use 3-deletion correcting codes whose best known redundancy is roughly $24 \log n$.

preprint2020arXiv

The Error Probability of Maximum-Likelihood Decoding over Two Deletion Channels

This paper studies the problem of reconstructing a word given several of its noisy copies. This setup is motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to output the transmitted word or some close approximation. The main focus of this paper is the case of two deletion channels and studying the error probability of the maximum-likelihood (ML) decoder under this setup. First, it is discussed how the ML decoder operates. Then, we observe that the dominant error patterns are deletions in the same run or errors resulting from alternating sequences. Based on these observations, it is derived that the error probability of the ML decoder is roughly $\frac{3q-1}{q-1}p^2$, when the transmitted word is any $q$-ary sequence and $p$ is the channel's deletion probability. We also study the cases when the transmitted word belongs to the Varshamov Tenengolts (VT) code or the shifted VT code. Lastly, the insertion channel is studied as well. These theoretical results are verified by corresponding simulations.

preprint2016arXiv

Bounds and Constructions of Codes with Multiple Localities

This paper studies bounds and constructions of locally repairable codes (LRCs) with multiple localities so-called multiple-locality LRCs (ML-LRCs). In the simplest case of two localities some code symbols of an ML-LRC have a certain locality while the remaining code symbols have another one. We extend two bounds, the Singleton and the alphabet-dependent upper bound on the dimension of Cadambe--Mazumdar for LRCs, to the case of ML-LRCs with more than two localities. Furthermore, we construct Singleton-optimal ML-LRCs as well as codes that achieve the extended alphabet-dependent bound. We give a family of binary ML-LRCs based on generalized code concatenation that is optimal with respect to the alphabet-dependent bound.

preprint2016arXiv

Codes Correcting a Burst of Deletions or Insertions

This paper studies codes that correct bursts of deletions. Namely, a code will be called a $b$-burst-deletion-correcting code if it can correct a deletion of any $b$ consecutive bits. While the lower bound on the redundancy of such codes was shown by Levenshtein to be asymptotically $\log(n)+b-1$, the redundancy of the best code construction by Cheng et al. is $b(\log (n/b+1))$. In this paper we close on this gap and provide codes with redundancy at most $\log(n) + (b-1)\log(\log(n)) +b -\log(b)$. We also derive a non-asymptotic upper bound on the size of $b$-burst-deletion-correcting codes and extend the burst deletion model to two more cases: 1) A deletion burst of at most $b$ consecutive bits and 2) A deletion burst of size at most $b$ (not necessarily consecutive). We extend our code construction for the first case and study the second case for $b=3,4$. The equivalent models for insertions are also studied and are shown to be equivalent to correcting the corresponding burst of deletions.

preprint2016arXiv

Coding for Locality in Reconstructing Permutations

The problem of storing permutations in a distributed manner arises in several common scenarios, such as efficient updates of a large, encrypted, or compressed data set. This problem may be addressed in either a combinatorial or a coding approach. The former approach boils down to presenting large sets of permutations with \textit{locality}, that is, any symbol of the permutation can be computed from a small set of other symbols. In the latter approach, a permutation may be coded in order to achieve locality. This paper focuses on the combinatorial approach. We provide upper and lower bounds for the maximal size of a set of permutations with locality, and provide several simple constructions which attain the upper bound. In cases where the upper bound is not attained, we provide alternative constructions using Reed-Solomon codes, permutation polynomials, and multi-permutations.

preprint2016arXiv

Performance of Multilevel Flash Memories with Different Binary Labelings: A Multi-User Perspective

In this work, we study the performance of different decoding schemes for multilevel flash memories where each page in every block is encoded independently. We focus on the multi-level cell (MLC) flash memory, which is modeled as a two-user multiple access channel suffering from asymmetric noise. The uniform rate regions and sum rates of Treating Interference as Noise (TIN) decoding and Successive Cancelation (SC) decoding are investigated for a Program/Erase (P/E) cycling model and a data retention model. We examine the effect of different binary labelings of the cell levels, as well as the impact of further quantization of the memory output (i.e., additional read thresholds). Finally, we extend our analysis to the three-level cell (TLC) flash memory.

preprint2015arXiv

Binary Linear Locally Repairable Codes

Locally repairable codes (LRCs) are a class of codes designed for the local correction of erasures. They have received considerable attention in recent years due to their applications in distributed storage. Most existing results on LRCs do not explicitly take into consideration the field size $q$, i.e., the size of the code alphabet. In particular, for the binary case, only a few results are known. In this work, we present an upper bound on the minimum distance $d$ of linear LRCs with availability, based on the work of Cadambe and Mazumdar. The bound takes into account the code length $n$, dimension $k$, locality $r$, availability $t$, and field size $q$. Then, we study binary linear LRCs in three aspects. First, we focus on analyzing the locality of some classical codes, i.e., cyclic codes and Reed-Muller codes, and their modified versions, which are obtained by applying the operations of extend, shorten, expurgate, augment, and lengthen. Next, we construct LRCs using phantom parity-check symbols and multi-level tensor product structure, respectively. Compared to other previous constructions of binary LRCs with fixed locality or minimum distance, our construction is much more flexible in terms of code parameters, and gives various families of high-rate LRCs, some of which are shown to be optimal with respect to their minimum distance. Finally, availability of LRCs is studied. We investigate the locality and availability properties of several classes of one-step majority-logic decodable codes, including cyclic simplex codes, cyclic difference-set codes, and $4$-cycle free regular low-density parity-check (LDPC) codes. We also show the construction of a long LRC with availability from a short one-step majority-logic decodable code.

preprint2015arXiv

Codes for Partially Stuck-at Memory Cells

In this work, we study a new model of defect memory cells, called partially stuck-at memory cells, which is motivated by the behavior of multi-level cells in non-volatile memories such as flash memories and phase change memories. If a cell can store the $q$ levels $0, 1, \dots, q-1$, we say that it is partially stuck-at level $s$, where $1 \leq s \leq q-1$, if it can only store values which are at least $s$. We follow the common setup where the encoder knows the positions and levels of the partially stuck-at cells whereas the decoder does not. Our main contribution in the paper is the study of codes for masking $u$ partially stuck-at cells. We first derive lower and upper bounds on the redundancy of such codes. The upper bounds are based on two trivial constructions. We then present three code constructions over an alphabet of size $q$, by first considering the case where the cells are partially stuck-at level $s=1$. The first construction works for $u<q$ and is asymptotically optimal if $u+1$ divides $q$. The second construction uses the reduced row Echelon form of matrices to generate codes for the case $u\geq q$, and the third construction solves the case of arbitrary $u$ by using codes which mask binary stuck-at cells. We then show how to generalize all constructions to arbitrary stuck levels. Furthermore, we study the dual defect model in which cells cannot reach higher levels, and show that codes for partially stuck-at cells can be used to mask this type of defects as well. Lastly, we analyze the capacity of the partially stuck-at memory channel and study how far our constructions are from the capacity.

preprint2015arXiv

Optimal Linear and Cyclic Locally Repairable Codes over Small Fields

We consider locally repairable codes over small fields and propose constructions of optimal cyclic and linear codes in terms of the dimension for a given distance and length. Four new constructions of optimal linear codes over small fields with locality properties are developed. The first two approaches give binary cyclic codes with locality two. While the first construction has availability one, the second binary code is characterized by multiple available repair sets based on a binary Simplex code. The third approach extends the first one to q-ary cyclic codes including (binary) extension fields, where the locality property is determined by the properties of a shortened first-order Reed-Muller code. Non-cyclic optimal binary linear codes with locality greater than two are obtained by the fourth construction.

preprint2015arXiv

PIR with Low Storage Overhead: Coding instead of Replication

Private information retrieval (PIR) protocols allow a user to retrieve a data item from a database without revealing any information about the identity of the item being retrieved. Specifically, in information-theoretic $k$-server PIR, the database is replicated among $k$ non-communicating servers, and each server learns nothing about the item retrieved by the user. The cost of PIR protocols is usually measured in terms of their communication complexity, which is the total number of bits exchanged between the user and the servers, and storage overhead, which is the ratio between the total number of bits stored on all the servers and the number of bits in the database. Since single-server information-theoretic PIR is impossible, the storage overhead of all existing PIR protocols is at least $2$. In this work, we show that information-theoretic PIR can be achieved with storage overhead arbitrarily close to the optimal value of $1$, without sacrificing the communication complexity. Specifically, we prove that all known $k$-server PIR protocols can be efficiently emulated, while preserving both privacy and communication complexity but significantly reducing the storage overhead. To this end, we distribute the $n$ bits of the database among $s+r$ servers, each storing $n/s$ coded bits (rather than replicas). For every fixed $k$, the resulting storage overhead $(s+r)/s$ approaches $1$ as $s$ grows; explicitly we have $r\le k\sqrt{s}(1+o(1))$. Moreover, in the special case $k = 2$, the storage overhead is only $1 + \frac{1}{s}$. In order to achieve these results, we introduce and study a new kind of binary linear codes, called here $k$-server PIR codes. We then show how such codes can be constructed, and we establish several bounds on the parameters of $k$-server PIR codes. Finally, we briefly discuss extensions of our results to nonbinary alphabets, to robust PIR, and to $t$-private PIR.

preprint2015arXiv

When Do WOM Codes Improve the Erasure Factor in Flash Memories?

Flash memory is a write-once medium in which reprogramming cells requires first erasing the block that contains them. The lifetime of the flash is a function of the number of block erasures and can be as small as several thousands. To reduce the number of block erasures, pages, which are the smallest write unit, are rewritten out-of-place in the memory. A Write-once memory (WOM) code is a coding scheme which enables to write multiple times to the block before an erasure. However, these codes come with significant rate loss. For example, the rate for writing twice (with the same rate) is at most 0.77. In this paper, we study WOM codes and their tradeoff between rate loss and reduction in the number of block erasures, when pages are written uniformly at random. First, we introduce a new measure, called erasure factor, that reflects both the number of block erasures and the amount of data that can be written on each block. A key point in our analysis is that this tradeoff depends upon the specific implementation of WOM codes in the memory. We consider two systems that use WOM codes; a conventional scheme that was commonly used, and a new recent design that preserves the overall storage capacity. While the first system can improve the erasure factor only when the storage rate is at most 0.6442, we show that the second scheme always improves this figure of merit.

preprint2014arXiv

Constrained Codes for Rank Modulation

Motivated by the rank modulation scheme, a recent work by Sala and Dolecek explored the study of constraint codes for permutations. The constraint studied by them is inherited by the inter-cell interference phenomenon in flash memories, where high-level cells can inadvertently increase the level of low-level cells. In this paper, the model studied by Sala and Dolecek is extended into two constraints. A permutation $σ\in S_n$ satisfies the \emph{two-neighbor $k$-constraint} if for all $2 \leq i \leq n-1$ either $|σ(i-1)-σ(i)|\leq k$ or $|σ(i)-σ(i+1)|\leq k$, and it satisfies the \emph{asymmetric two-neighbor $k$-constraint} if for all $2 \leq i \leq n-1$, either $σ(i-1)-σ(i) < k$ or $σ(i+1)-σ(i) < k$. We show that the capacity of the first constraint is $(1+ε)/2$ in case that $k=Θ(n^ε)$ and the capacity of the second constraint is 1 regardless to the value of $k$. We also extend our results and study the capacity of these two constraints combined with error-correction codes in the Kendall's $τ$ metric.

preprint2014arXiv

Construction of Partial MDS (PMDS) and Sector-Disk (SD) Codes with Two Global Parity Symbols

Partial MDS (PMDS) codes are erasure codes combining local (row) correction with global additional correction of entries, while Sector-Disk (SD) codes are erasure codes that address the mixed failure mode of current RAID systems. It has been an open problem to construct general codes that have the PMDS and the SD properties, and previous work has relied on Monte-Carlo searches. In this paper, we present a general construction that addresses the case of any number of failed disks and in addition, two erased sectors. The construction requires a modest field size. This result generalizes previous constructions extending RAID~5 and RAID~6.

preprint2014arXiv

Generalized Sphere Packing Bound

Kulkarni and Kiyavash recently introduced a new method to establish upper bounds on the size of deletion-correcting codes. This method is based upon tools from hypergraph theory. The deletion channel is represented by a hypergraph whose edges are the deletion balls (or spheres), so that a deletion-correcting code becomes a matching in this hypergraph. Consequently, a bound on the size of such a code can be obtained from bounds on the matching number of a hypergraph. Classical results in hypergraph theory are then invoked to compute an upper bound on the matching number as a solution to a linear-programming problem. The method by Kulkarni and Kiyavash can be applied not only for the deletion channel but also for other error channels. This paper studies this method in its most general setup. First, it is shown that if the error channel is regular and symmetric then this upper bound coincides with the sphere packing bound and thus is called the generalized sphere packing bound. Even though this bound is explicitly given by a linear programming problem, finding its exact value may still be a challenging task. In order to simplify the complexity of the problem, we present a technique based upon graph automorphisms that in many cases reduces the number of variables and constraints in the problem. We then apply this method on specific examples of error channels. We start with the $Z$ channel and show how to exactly find the generalized sphere packing bound for this setup. Next studied is the non-binary limited magnitude channel both for symmetric and asymmetric errors, where we focus on the single-error case. We follow up on the deletion and grain-error channels and show how to improve upon the existing upper bounds for single deletion/error. Finally, we apply this method for projective spaces and find its generalized sphere packing bound for the single-error case.

preprint2014arXiv

Rank-Modulation Rewrite Coding for Flash Memories

The current flash memory technology focuses on the cost minimization of its static storage capacity. However, the resulting approach supports a relatively small number of program-erase cycles. This technology is effective for consumer devices (e.g., smartphones and cameras) where the number of program-erase cycles is small. However, it is not economical for enterprise storage systems that require a large number of lifetime writes. The proposed approach in this paper for alleviating this problem consists of the efficient integration of two key ideas: (i) improving reliability and endurance by representing the information using relative values via the rank modulation scheme and (ii) increasing the overall (lifetime) capacity of the flash device via rewriting codes, namely, performing multiple writes per cell before erasure. This paper presents a new coding scheme that combines rank modulation with rewriting. The key benefits of the new scheme include: (i) the ability to store close to 2 bits per cell on each write with minimal impact on the lifetime of the memory, and (ii) efficient encoding and decoding algorithms that make use of capacity-achieving write-once-memory (WOM) codes that were proposed recently.

preprint2014arXiv

Systematic Codes for Rank Modulation

The goal of this paper is to construct systematic error-correcting codes for permutations and multi-permutations in the Kendall's $τ$-metric. These codes are important in new applications such as rank modulation for flash memories. The construction is based on error-correcting codes for multi-permutations and a partition of the set of permutations into error-correcting codes. For a given large enough number of information symbols $k$, and for any integer $t$, we present a construction for ${(k+r,k)}$ systematic $t$-error-correcting codes, for permutations from $S_{k+r}$, with less redundancy symbols than the number of redundancy symbols in the codes of the known constructions. In particular, for a given $t$ and for sufficiently large $k$ we can obtain $r=t+1$. The same construction is also applied to obtain related systematic error-correcting codes for multi-permutations.

preprint2012arXiv

Coding for the Lee and Manhattan Metrics with Weighing Matrices

This paper has two goals. The first one is to discuss good codes for packing problems in the Lee and Manhattan metrics. The second one is to consider weighing matrices for some of these coding problems. Weighing matrices were considered as building blocks for codes in the Hamming metric in various constructions. In this paper we will consider mainly two types of weighing matrices, namely conference matrices and Hadamard matrices, to construct codes in the Lee (and Manhattan) metric. We will show that these matrices have some desirable properties when considered as generator matrices for codes in these metrics. Two related packing problems will be considered. The first is to find good codes for error-correction (i.e. dense packings of Lee spheres). The second is to transform the space in a way that volumes are preserved and each Lee sphere (or conscribed cross-polytope), in the space, will be transformed to a shape inscribed in a small cube.

preprint2012arXiv

Rewriting Codes for Flash Memories

Flash memory is a non-volatile computer memory comprising blocks of cells, wherein each cell can take on q different values or levels. While increasing the cell level is easy, reducing the level of a cell can be accomplished only by erasing an entire block. Since block erasures are highly undesirable, coding schemes - known as floating codes (or flash codes) and buffer codes - have been designed in order to maximize the number of times that information stored in a flash memory can be written (and re-written) prior to incurring a block erasure. An (n,k,t)q flash code C is a coding scheme for storing k information bits in $n$ cells in such a way that any sequence of up to t writes can be accommodated without a block erasure. The total number of available level transitions in n cells is n(q-1), and the write deficiency of C, defined as δ(C) = n(q-1)-t, is a measure of how close the code comes to perfectly utilizing all these transitions. In this paper, we show a construction of flash codes with write deficiency O(qk\log k) if q \geq \log_2k, and at most O(k\log^2 k) otherwise. An (n,r,\ell,t)q buffer code is a coding scheme for storing a buffer of r \ell-ary symbols such that for any sequence of t symbols it is possible to successfully decode the last r symbols that were written. We improve upon a previous upper bound on the maximum number of writes t in the case where there is a single cell to store the buffer. Then, we show how to improve a construction by Jiang et al. that uses multiple cells, where n\geq 2r.

preprint2012arXiv

Time-Space Constrained Codes for Phase-Change Memories

Phase-change memory (PCM) is a promising non-volatile solid-state memory technology. A PCM cell stores data by using its amorphous and crystalline states. The cell changes between these two states using high temperature. However, since the cells are sensitive to high temperature, it is important, when programming cells, to balance the heat both in time and space. In this paper, we study the time-space constraint for PCM, which was originally proposed by Jiang et al. A code is called an \emph{$(α,β,p)$-constrained code} if for any $α$ consecutive rewrites and for any segment of $β$ contiguous cells, the total rewrite cost of the $β$ cells over those $α$ rewrites is at most $p$. Here, the cells are binary and the rewrite cost is defined to be the Hamming distance between the current and next memory states. First, we show a general upper bound on the achievable rate of these codes which extends the results of Jiang et al. Then, we generalize their construction for $(α\geq 1, β=1,p=1)$-constrained codes and show another construction for $(α= 1, β\geq 1,p\geq1)$-constrained codes. Finally, we show that these two constructions can be used to construct codes for all values of $α$, $β$, and $p$.

preprint2010arXiv

Dense Error-Correcting Codes in the Lee Metric

Several new applications and a number of new mathematical techniques have increased the research on error-correcting codes in the Lee metric in the last decade. In this work we consider several coding problems and constructions of error-correcting codes in the Lee metric. First, we consider constructions of dense error-correcting codes in relatively small dimensions over small alphabets. The second problem we solve is construction of diametric perfect codes with minimum distance four. We will construct such codes over various lengths and alphabet sizes. The third problem is to transfer an n-dimensional Lee sphere with large radius into a shape, with the same volume, located in a relatively small box. Hadamard matrices play an essential role in the solutions for all three problems. A construction of codes based on Hadamard matrices will start our discussion. These codes approach the sphere packing bound for very high rate range and appear to be the best known codes over some sets of parameters.

preprint2010arXiv

High Dimensional Error-Correcting Codes

In this paper we construct multidimensional codes with high dimension. The codes can correct high dimensional errors which have the form of either small clusters, or confined to an area with a small radius. We also consider small number of errors in a small area. The clusters which are discussed are mainly spheres such as semi-crosses and crosses. Also considered are clusters with small number of errors such as 2-bursts, two errors in various clusters, and three errors on a line. Our main focus is on the redundancy of the codes when the most dominant parameter is the dimension of the code.

preprint2009arXiv

Multidimensional Flash Codes

Flash memory is a non-volatile computer memory comprised of blocks of cells, wherein each cell can take on q different levels corresponding to the number of electrons it contains. Increasing the cell level is easy; however, reducing a cell level forces all the other cells in the same block to be erased. This erasing operation is undesirable and therefore has to be used as infrequently as possible. We consider the problem of designing codes for this purpose, where k bits are stored using a block of n cells with q levels each. The goal is to maximize the number of bit writes before an erase operation is required. We present an efficient construction of codes that can store an arbitrary number of bits. Our construction can be viewed as an extension to multiple dimensions of the earlier work of Jiang and Bruck, where single-dimensional codes that can store only 2 bits were proposed.

Eitan Yaakobi

What is connected

Connect this record

See the researcher in context

Building this map preview

46 published item(s)

Analyzing Collection Strategies: A Computational Perspective on the Coupon Collector Problem

Efficient Synthesis for Two-Dimensional Strand Arrays with Row Constraints

Random Access in DNA Storage: Algorithms, Constructions, and Bounds

Reconstructing Reed-Solomon Codes from Multiple Noisy Channel Outputs

Byzantine-Resilient Gradient Coding through Local Gradient Computations

Codes for Constrained Periodicity

Covering Sequences for $\ell$-Tuples

Equivalence of Insertion/Deletion Correcting Codes for $d$-dimensional Arrays

On the Size of Balls and Anticodes of Small Diameter under the Fixed-Length Levenshtein Metric

Reconstruction from Substrings with Partial Overlap

The Input and Output Entropies of the $k$-Deletion/Insertion Channel

Codes over Trees

Correctable Erasure Patterns in Product Topologies

Multi-strand Reconstruction from Substrings

The Capacity of Single-Server Weakly-Private Information Retrieval

The Zero Cubes Free and Cubes Unique Multidimensional Constraints

Array Codes for Functional PIR and Batch Codes

Coding for Sequence Reconstruction for Single Edits

Coding over Sets for DNA Storage

Constrained de Bruijn Codes: Properties, Enumeration, Constructions, and Applications

Covering Codes using Insertions or Deletions

Optimal Reconstruction Codes for Deletion Channels

Partial MDS Codes with Local Regeneration

PIR Codes with Short Block Length

Single-Deletion Single-Substitution Correcting Codes

The Error Probability of Maximum-Likelihood Decoding over Two Deletion Channels

Bounds and Constructions of Codes with Multiple Localities

Codes Correcting a Burst of Deletions or Insertions

Coding for Locality in Reconstructing Permutations

Performance of Multilevel Flash Memories with Different Binary Labelings: A Multi-User Perspective

Binary Linear Locally Repairable Codes

Codes for Partially Stuck-at Memory Cells

Optimal Linear and Cyclic Locally Repairable Codes over Small Fields

PIR with Low Storage Overhead: Coding instead of Replication

When Do WOM Codes Improve the Erasure Factor in Flash Memories?

Constrained Codes for Rank Modulation

Construction of Partial MDS (PMDS) and Sector-Disk (SD) Codes with Two Global Parity Symbols

Generalized Sphere Packing Bound

Rank-Modulation Rewrite Coding for Flash Memories

Systematic Codes for Rank Modulation

Coding for the Lee and Manhattan Metrics with Weighing Matrices

Rewriting Codes for Flash Memories

Time-Space Constrained Codes for Phase-Change Memories

Dense Error-Correcting Codes in the Lee Metric

High Dimensional Error-Correcting Codes

Multidimensional Flash Codes