Source author record

Jehoshua Bruck

Jehoshua Bruck appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT math.PR Cryptography and Security Discrete Mathematics Distributed, Parallel, and Cluster Computing Machine Learning Genomics Neural and Evolutionary Computing Computation and Language Computational Complexity Data Structures and Algorithms Formal Languages and Automata Theory Networking and Internet Architecture

Catalog footprint

What is connected

36works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Correcting $k$ Deletions and Insertions in Racetrack Memory

One of the main challenges in developing racetrack memory systems is the limited precision in controlling the track shifts, that in turn affects the reliability of reading and writing the data. A current proposal for combating deletions in racetrack memories is to use redundant heads per-track resulting in multiple copies (potentially erroneous) and recovering the data by solving a specialized version of a sequence reconstruction problem. Using this approach, $k$-deletion correcting codes of length $n$, with $d \ge 2$ heads per-track, with redundancy $\log \log n + 4$ were constructed. However, the known approach requires that $k \le d$, namely, that the number of heads ($d$) is larger than or equal to the number of correctable deletions ($k$). Here we address the question: What is the best redundancy that can be achieved for a $k$-deletion code ($k$ is a constant) if the number of heads is fixed at $d$ (due to implementation constraints)? One of our key results is an answer to this question, namely, we construct codes that can correct $k$ deletions, for any $k$ beyond the known limit of $d$. The code has $4k \log \log n+o(\log \log n)$ redundancy for $k \le 2d-1$. In addition, when $k \ge 2d$, our codes have $2 \lfloor k/d\rfloor \log n+o(\log n)$ redundancy, that we prove it is order-wise optimal, specifically, we prove that the redundancy required for correcting $k$ deletions is at least $\lfloor k/d\rfloor \log n+o(\log n)$. The encoding/decoding complexity of our codes is $O(n\log^{2k}n)$. Finally, we ask a general question: What is the optimal redundancy for codes correcting a combination of at most $k$ deletions and insertions in a $d$-head racetrack memory? We prove that the redundancy sufficient to correct a combination of $k$ deletion and insertion errors is similar to the case of $k$ deletion errors.

preprint2022arXiv

On Algebraic Constructions of Neural Networks with Small Weights

Neural gates compute functions based on weighted sums of the input variables. The expressive power of neural gates (number of distinct functions it can compute) depends on the weight sizes and, in general, large weights (exponential in the number of inputs) are required. Studying the trade-offs among the weight sizes, circuit size and depth is a well-studied topic both in circuit complexity theory and the practice of neural computation. We propose a new approach for studying these complexity trade-offs by considering a related algebraic framework. Specifically, given a single linear equation with arbitrary coefficients, we would like to express it using a system of linear equations with smaller (even constant) coefficients. The techniques we developed are based on Siegel's Lemma for the bounds, anti-concentration inequalities for the existential results and extensions of Sylvester-type Hadamard matrices for the constructions. We explicitly construct a constant weight, optimal size matrix to compute the EQUALITY function (checking if two integers expressed in binary are equal). Computing EQUALITY with a single linear equation requires exponentially large weights. In addition, we prove the existence of the best-known weight size (linear) matrices to compute the COMPARISON function (comparing between two integers expressed in binary). In the context of the circuit complexity theory, our results improve the upper bounds on the weight sizes for the best-known circuit sizes for EQUALITY and COMPARISON.

preprint2020arXiv

Coding for Optimized Writing Rate in DNA Storage

A method for encoding information in DNA sequences is described. The method is based on the precision-resolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis method. The suggested method optimizes the amount of information bits per synthesis time unit, namely, the writing rate. Additionally, the encoding scheme studied here takes into account the existence of multiple copies of the DNA sequence, which are independently distorted. Finally, quantizers for various run-length distributions are designed.

preprint2020arXiv

CodNN -- Robust Neural Networks From Coded Classification

Deep Neural Networks (DNNs) are a revolutionary force in the ongoing information revolution, and yet their intrinsic properties remain a mystery. In particular, it is widely known that DNNs are highly sensitive to noise, whether adversarial or random. This poses a fundamental challenge for hardware implementations of DNNs, and for their deployment in critical applications such as autonomous driving. In this paper we construct robust DNNs via error correcting codes. By our approach, either the data or internal layers of the DNN are coded with error correcting codes, and successful computation under noise is guaranteed. Since DNNs can be seen as a layered concatenation of classification tasks, our research begins with the core task of classifying noisy coded inputs, and progresses towards robust DNNs. We focus on binary data and linear codes. Our main result is that the prevalent parity code can guarantee robustness for a large family of DNNs, which includes the recently popularized binarized neural networks. Further, we show that the coded classification problem has a deep connection to Fourier analysis of Boolean functions. In contrast to existing solutions in the literature, our results do not rely on altering the training process of the DNN, and provide mathematically rigorous guarantees rather than experimental evidence.

preprint2020arXiv

What is the Value of Data? On Mathematical Methods for Data Quality Estimation

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset's quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.

preprint2016arXiv

Communication Efficient Secret Sharing

A secret sharing scheme is a method to store information securely and reliably. Particularly, in a threshold secret sharing scheme, a secret is encoded into $n$ shares, such that any set of at least $t_1$ shares suffice to decode the secret, and any set of at most $t_2 < t_1$ shares reveal no information about the secret. Assuming that each party holds a share and a user wishes to decode the secret by receiving information from a set of parties; the question we study is how to minimize the amount of communication between the user and the parties. We show that the necessary amount of communication, termed "decoding bandwidth", decreases as the number of parties that participate in decoding increases. We prove a tight lower bound on the decoding bandwidth, and construct secret sharing schemes achieving the bound. Particularly, we design a scheme that achieves the optimal decoding bandwidth when $d$ parties participate in decoding, universally for all $t_1 \le d \le n$. The scheme is based on Shamir's secret sharing scheme and preserves its simplicity and efficiency. In addition, we consider secure distributed storage where the proposed communication efficient secret sharing schemes further improve disk access complexity during decoding.

preprint2016arXiv

Duplication Distance to the Root for Binary Sequences

We study the tandem duplication distance between binary sequences and their roots. In other words, the quantity of interest is the number of tandem duplication operations of the form $\seq x = \seq a \seq b \seq c \to \seq y = \seq a \seq b \seq b \seq c$, where $\seq x$ and $\seq y$ are sequences and $\seq a$, $\seq b$, and $\seq c$ are their substrings, needed to generate a binary sequence of length $n$ starting from a square-free sequence from the set $\{0,1,01,10,010,101\}$. This problem is a restricted case of finding the duplication/deduplication distance between two sequences, defined as the minimum number of duplication and deduplication operations required to transform one sequence to the other. We consider both exact and approximate tandem duplications. For exact duplication, denoting the maximum distance to the root of a sequence of length $n$ by $f(n)$, we prove that $f(n)=Θ(n)$. For the case of approximate duplication, where a $β$-fraction of symbols may be duplicated incorrectly, we show that the maximum distance has a sharp transition from linear in $n$ to logarithmic at $β=1/2$. We also study the duplication distance to the root for sequences with a given root and for special classes of sequences, namely, the de Bruijn sequences, the Thue-Morse sequence, and the Fibbonaci words. The problem is motivated by genomic tandem duplication mutations and the smallest number of tandem duplication events required to generate a given biological sequence.

preprint2016arXiv

Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms

The ability to store data in the DNA of a living organism has applications in a variety of areas including synthetic biology and watermarking of patented genetically-modified organisms. Data stored in this medium is subject to errors arising from various mutations, such as point mutations, indels, and tandem duplication, which need to be corrected to maintain data integrity. In this paper, we provide error-correcting codes for errors caused by tandem duplications, which create a copy of a block of the sequence and insert it in a tandem manner, i.e., next to the original. In particular, we present two families of codes for correcting errors due to tandem-duplications of a fixed length, the first family can correct any number of errors while the second corrects a bounded number of errors. We also study codes for correcting tandem duplications of length up to a given constant $k$, where we are primarily focused on the cases of $k=2,3$. Finally, we provide a full classification of the sets of lengths allowed in tandem duplication that result in a unique root for all sequences.

preprint2016arXiv

Optimal Rebuilding of Multiple Erasures in MDS Codes

MDS array codes are widely used in storage systems due to their computationally efficient encoding and decoding procedures. An MDS code with $r$ redundancy nodes can correct any $r$ node erasures by accessing all the remaining information in the surviving nodes. However, in practice, $e$ erasures is a more likely failure event, for $1\le e<r$. Hence, a natural question is how much information do we need to access in order to rebuild $e$ storage nodes? We define the rebuilding ratio as the fraction of remaining information accessed during the rebuilding of $e$ erasures. In our previous work we constructed MDS codes, called zigzag codes, that achieve the optimal rebuilding ratio of $1/r$ for the rebuilding of any systematic node when $e=1$, however, all the information needs to be accessed for the rebuilding of the parity node erasure. The (normalized) repair bandwidth is defined as the fraction of information transmitted from the remaining nodes during the rebuilding process. For codes that are not necessarily MDS, Dimakis et al. proposed the regenerating codes framework where any $r$ erasures can be corrected by accessing some of the remaining information, and any $e=1$ erasure can be rebuilt from some subsets of surviving nodes with optimal repair bandwidth. In this work, we study 3 questions on rebuilding of codes: (i) We show a fundamental trade-off between the storage size of the node and the repair bandwidth similar to the regenerating codes framework, and show that zigzag codes achieve the optimal rebuilding ratio of $e/r$ for MDS codes, for any $1\le e\le r$. (ii) We construct systematic codes that achieve optimal rebuilding ratio of $1/r$, for any systematic or parity node erasure. (iii) We present error correction algorithms for zigzag codes, and in particular demonstrate how these codes can be corrected beyond their minimum Hamming distances.

preprint2015arXiv

Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes

We propose efficient coding schemes for two communication settings: 1. asymmetric channels, and 2. channels with an informed encoder. These settings are important in non-volatile memories, as well as optical and broadcast communication. The schemes are based on non-linear polar codes, and they build on and improve recent work on these settings. In asymmetric channels, we tackle the exponential storage requirement of previously known schemes, that resulted from the use of large Boolean functions. We propose an improved scheme, that achieves the capacity of asymmetric channels with polynomial computational complexity and storage requirement. The proposed non-linear scheme is then generalized to the setting of channel coding with an informed encoder, using a multicoding technique. We consider specific instances of the scheme for flash memories, that incorporate error-correction capabilities together with rewriting. Since the considered codes are non-linear, they eliminate the requirement of previously known schemes (called polar write-once-memory codes) for shared randomness between the encoder and the decoder. Finally, we mention that the multicoding scheme is also useful for broadcast communication in Marton's region, improving upon previous schemes for this setting.

preprint2015arXiv

Capacity and Expressiveness of Genomic Tandem Duplication

The majority of the human genome consists of repeated sequences. An important type of repeated sequences common in the human genome are tandem repeats, where identical copies appear next to each other. For example, in the sequence $AGTC\underline{TGTG}C$, $TGTG$ is a tandem repeat, that may be generated from $AGTCTGC$ by a tandem duplication of length $2$. In this work, we investigate the possibility of generating a large number of sequences from a \textit{seed}, i.e.\ a small initial string, by tandem duplications of bounded length. We study the capacity of such a system, a notion that quantifies the system's generating power. Our results include \textit{exact capacity} values for certain tandem duplication string systems. In addition, motivated by the role of DNA sequences in expressing proteins via RNA and the genetic code, we define the notion of the \textit{expressiveness} of a tandem duplication system as the capability of expressing arbitrary substrings. We then \textit{completely} characterize the expressiveness of tandem duplication systems for general alphabet sizes and duplication lengths. In particular, based on a celebrated result by Axel Thue from 1906, presenting a construction for ternary square-free sequences, we show that for alphabets of size 4 or larger, bounded tandem duplication systems, regardless of the seed and the bound on duplication length, are not fully expressive, i.e. they cannot generate all strings even as substrings of other strings. Note that the alphabet of size 4 is of particular interest as it pertains to the genomic alphabet. Building on this result, we also show that these systems do not have full capacity. In general, our results illustrate that duplication lengths play a more significant role than the seed in generating a large number of sequences for these systems.

preprint2015arXiv

Rewriting Flash Memories by Message Passing

This paper constructs WOM codes that combine rewriting and error correction for mitigating the reliability and the endurance problems in flash memory. We consider a rewriting model that is of practical interest to flash applications where only the second write uses WOM codes. Our WOM code construction is based on binary erasure quantization with LDGM codes, where the rewriting uses message passing and has potential to share the efficient hardware implementations with LDPC codes in practice. We show that the coding scheme achieves the capacity of the rewriting model. Extensive simulations show that the rewriting performance of our scheme compares favorably with that of polar WOM code in the rate region where high rewriting success probability is desired. We further augment our coding schemes with error correction capability. By drawing a connection to the conjugate code pairs studied in the context of quantum error correction, we develop a general framework for constructing error-correction WOM codes. Under this framework, we give an explicit construction of WOM codes whose codewords are contained in BCH codes.

preprint2014arXiv

Explicit MDS Codes for Optimal Repair Bandwidth

MDS codes are erasure-correcting codes that can correct the maximum number of erasures for a given number of redundancy or parity symbols. If an MDS code has $r$ parities and no more than $r$ erasures occur, then by transmitting all the remaining data in the code, the original information can be recovered. However, it was shown that in order to recover a single symbol erasure, only a fraction of $1/r$ of the information needs to be transmitted. This fraction is called the repair bandwidth (fraction). Explicit code constructions were given in previous works. If we view each symbol in the code as a vector or a column over some field, then the code forms a 2D array and such codes are especially widely used in storage systems. In this paper, we address the following question: given the length of the column $l$, number of parities $r$, can we construct high-rate MDS array codes with optimal repair bandwidth of $1/r$, whose code length is as long as possible? In this paper, we give code constructions such that the code length is $(r+1)\log_r l$.

preprint2014arXiv

Rank-Modulation Rewrite Coding for Flash Memories

The current flash memory technology focuses on the cost minimization of its static storage capacity. However, the resulting approach supports a relatively small number of program-erase cycles. This technology is effective for consumer devices (e.g., smartphones and cameras) where the number of program-erase cycles is small. However, it is not economical for enterprise storage systems that require a large number of lifetime writes. The proposed approach in this paper for alleviating this problem consists of the efficient integration of two key ideas: (i) improving reliability and endurance by representing the information using relative values via the rank modulation scheme and (ii) increasing the overall (lifetime) capacity of the flash device via rewriting codes, namely, performing multiple writes per cell before erasure. This paper presents a new coding scheme that combines rank modulation with rewriting. The key benefits of the new scheme include: (i) the ability to store close to 2 bits per cell on each write with minimal impact on the lifetime of the memory, and (ii) efficient encoding and decoding algorithms that make use of capacity-achieving write-once-memory (WOM) codes that were proposed recently.

preprint2014arXiv

Rate-Distortion for Ranking with Incomplete Information

We study the rate-distortion relationship in the set of permutations endowed with the Kendall Tau metric and the Chebyshev metric. Our study is motivated by the application of permutation rate-distortion to the average-case and worst-case analysis of algorithms for ranking with incomplete information and approximate sorting algorithms. For the Kendall Tau metric we provide bounds for small, medium, and large distortion regimes, while for the Chebyshev metric we present bounds that are valid for all distortions and are especially accurate for small distortions. In addition, for the Chebyshev metric, we provide a construction for covering codes.

preprint2014arXiv

Systematic Codes for Rank Modulation

The goal of this paper is to construct systematic error-correcting codes for permutations and multi-permutations in the Kendall's $τ$-metric. These codes are important in new applications such as rank modulation for flash memories. The construction is based on error-correcting codes for multi-permutations and a partition of the set of permutations into error-correcting codes. For a given large enough number of information symbols $k$, and for any integer $t$, we present a construction for ${(k+r,k)}$ systematic $t$-error-correcting codes, for permutations from $S_{k+r}$, with less redundancy symbols than the number of redundancy symbols in the codes of the known constructions. In particular, for a given $t$ and for sufficiently large $k$ we can obtain $r=t+1$. The same construction is also applied to obtain related systematic error-correcting codes for multi-permutations.

preprint2014arXiv

The Capacity of String-Replication Systems

It is known that the majority of the human genome consists of repeated sequences. Furthermore, it is believed that a significant part of the rest of the genome also originated from repeated sequences and has mutated to its current form. In this paper, we investigate the possibility of constructing an exponentially large number of sequences from a short initial sequence and simple replication rules, including those resembling genomic replication processes. In other words, our goal is to find out the capacity, or the expressive power, of these string-replication systems. Our results include exact capacities, and bounds on the capacities, of four fundamental string-replication systems.

preprint2013arXiv

Access vs. Bandwidth in Codes for Storage

Maximum distance separable (MDS) codes are widely used in storage systems to protect against disk (node) failures. A node is said to have capacity $l$ over some field $\mathbb{F}$, if it can store that amount of symbols of the field. An $(n,k,l)$ MDS code uses $n$ nodes of capacity $l$ to store $k$ information nodes. The MDS property guarantees the resiliency to any $n-k$ node failures. An \emph{optimal bandwidth} (resp. \emph{optimal access}) MDS code communicates (resp. accesses) the minimum amount of data during the repair process of a single failed node. It was shown that this amount equals a fraction of $1/(n-k)$ of data stored in each node. In previous optimal bandwidth constructions, $l$ scaled polynomially with $k$ in codes with asymptotic rate $<1$. Moreover, in constructions with a constant number of parities, i.e. rate approaches 1, $l$ is scaled exponentially w.r.t. $k$. In this paper, we focus on the later case of constant number of parities $n-k=r$, and ask the following question: Given the capacity of a node $l$ what is the largest number of information disks $k$ in an optimal bandwidth (resp. access) $(k+r,k,l)$ MDS code. We give an upper bound for the general case, and two tight bounds in the special cases of two important families of codes. Moreover, the bounds show that in some cases optimal-bandwidth code has larger $k$ than optimal-access code, and therefore these two measures are not equivalent.

preprint2013arXiv

Systematic Error-Correcting Codes for Rank Modulation

The rank-modulation scheme has been recently proposed for efficiently storing data in nonvolatile memories. Error-correcting codes are essential for rank modulation, however, existing results have been limited. In this work we explore a new approach, \emph{systematic error-correcting codes for rank modulation}. Systematic codes have the benefits of enabling efficient information retrieval and potentially supporting more efficient encoding and decoding procedures. We study systematic codes for rank modulation under Kendall's $τ$-metric as well as under the $\ell_\infty$-metric. In Kendall's $τ$-metric we present $[k+2,k,3]$-systematic codes for correcting one error, which have optimal rates, unless systematic perfect codes exist. We also study the design of multi-error-correcting codes, and provide two explicit constructions, one resulting in $[n+1,k+1,2t+2]$ systematic codes with redundancy at most $2t+1$. We use non-constructive arguments to show the existence of $[n,k,n-k]$-systematic codes for general parameters. Furthermore, we prove that for rank modulation, systematic codes achieve the same capacity as general error-correcting codes. Finally, in the $\ell_\infty$-metric we construct two $[n,k,d]$ systematic multi-error-correcting codes, the first for the case of $d=O(1)$, and the second for $d=Θ(n)$. In the latter case, the codes have the same asymptotic rate as the best codes currently known in this metric.

preprint2012arXiv

A Universal Scheme for Transforming Binary Algorithms to Generate Random Bits from Loaded Dice

In this paper, we present a universal scheme for transforming an arbitrary algorithm for biased 2-face coins to generate random bits from the general source of an m-sided die, hence enabling the application of existing algorithms to general sources. In addition, we study approaches of efficiently generating a prescribed number of random bits from an arbitrary biased coin. This contrasts with most existing works, which typically assume that the number of coin tosses is fixed, and they generate a variable number of random bits.

preprint2012arXiv

Balanced Modulation for Nonvolatile Memories

This paper presents a practical writing/reading scheme in nonvolatile memories, called balanced modulation, for minimizing the asymmetric component of errors. The main idea is to encode data using a balanced error-correcting code. When reading information from a block, it adjusts the reading threshold such that the resulting word is also balanced or approximately balanced. Balanced modulation has suboptimal performance for any cell-level distribution and it can be easily implemented in the current systems of nonvolatile memories. Furthermore, we studied the construction of balanced error-correcting codes, in particular, balanced LDPC codes. It has very efficient encoding and decoding algorithms, and it is more efficient than prior construction of balanced error-correcting codes.

preprint2012arXiv

Efficiently Extracting Randomness from Imperfect Stochastic Processes

We study the problem of extracting a prescribed number of random bits by reading the smallest possible number of symbols from non-ideal stochastic processes. The related interval algorithm proposed by Han and Hoshi has asymptotically optimal performance; however, it assumes that the distribution of the input stochastic process is known. The motivation for our work is the fact that, in practice, sources of randomness have inherent correlations and are affected by measurement's noise. Namely, it is hard to obtain an accurate estimation of the distribution. This challenge was addressed by the concepts of seeded and seedless extractors that can handle general random sources with unknown distributions. However, known seeded and seedless extractors provide extraction efficiencies that are substantially smaller than Shannon's entropy limit. Our main contribution is the design of extractors that have a variable input-length and a fixed output length, are efficient in the consumption of symbols from the source, are capable of generating random bits from general stochastic processes and approach the information theoretic upper bound on efficiency.

preprint2012arXiv

Linear Transformations for Randomness Extraction

Information-efficient approaches for extracting randomness from imperfect sources have been extensively studied, but simpler and faster ones are required in the high-speed applications of random number generation. In this paper, we focus on linear constructions, namely, applying linear transformation for randomness extraction. We show that linear transformations based on sparse random matrices are asymptotically optimal to extract randomness from independent sources and bit-fixing sources, and they are efficient (may not be optimal) to extract randomness from hidden Markov sources. Further study demonstrates the flexibility of such constructions on source models as well as their excellent information-preserving capabilities. Since linear transformations based on sparse random matrices are computationally fast and can be easy to implement using hardware like FPGAs, they are very attractive in the high-speed applications. In addition, we explore explicit constructions of transformation matrices. We show that the generator matrices of primitive BCH codes are good choices, but linear transformations based on such matrices require more computational time due to their high densities.

preprint2012arXiv

Nonuniform Codes for Correcting Asymmetric Errors in Data Storage

The construction of asymmetric error correcting codes is a topic that was studied extensively, however, the existing approach for code construction assumes that every codeword should tolerate $t$ asymmetric errors. Our main observation is that in contrast to symmetric errors, asymmetric errors are content dependent. For example, in Z-channels, the all-1 codeword is prone to have more errors than the all-0 codeword. This motivates us to develop nonuniform codes whose codewords can tolerate different numbers of asymmetric errors depending on their Hamming weights. The idea in a nonuniform codes' construction is to augment the redundancy in a content-dependent way and guarantee the worst case reliability while maximizing the code size. In this paper, we first study nonuniform codes for Z-channels, namely, they only suffer one type of errors, say 1 to 0. Specifically, we derive their upper bounds, analyze their asymptotic performances, and introduce two general constructions. Then we extend the concept and results of nonuniform codes to general binary asymmetric channels, where the error probability for each bit from 0 to 1 is smaller than that from 1 to 0.

preprint2012arXiv

Streaming Algorithms for Optimal Generation of Random Bits

Generating random bits from a source of biased coins (the biased is unknown) is a classical question that was originally studied by von Neumann. There are a number of known algorithms that have asymptotically optimal information efficiency, namely, the expected number of generated random bits per input bit is asymptotically close to the entropy of the source. However, only the original von Neumann algorithm has a `streaming property' - it operates on a single input bit at a time and it generates random bits when possible, alas, it does not have an optimal information efficiency. The main contribution of this paper is an algorithm that generates random bit streams from biased coins, uses bounded space and runs in expected linear time. As the size of the allotted space increases, the algorithm approaches the information-theoretic upper bound on efficiency. In addition, we discuss how to extend this algorithm to generate random bit streams from m-sided dice or correlated sources such as Markov chains.

preprint2012arXiv

Synthesis of Stochastic Flow Networks

A stochastic flow network is a directed graph with incoming edges (inputs) and outgoing edges (outputs), tokens enter through the input edges, travel stochastically in the network, and can exit the network through the output edges. Each node in the network is a splitter, namely, a token can enter a node through an incoming edge and exit on one of the output edges according to a predefined probability distribution. Stochastic flow networks can be easily implemented by DNA-based chemical reactions, with promising applications in molecular computing and stochastic computing. In this paper, we address a fundamental synthesis question: Given a finite set of possible splitters and an arbitrary rational probability distribution, design a stochastic flow network, such that every token that enters the input edge will exit the outputs with the prescribed probability distribution. The problem of probability transformation dates back to von Neumann's 1951 work and was followed, among others, by Knuth and Yao in 1976. Most existing works have been focusing on the "simulation" of target distributions. In this paper, we design optimal-sized stochastic flow networks for "synthesizing" target distributions. It shows that when each splitter has two outgoing edges and is unbiased, an arbitrary rational probability \frac{a}{b} with a\leq b\leq 2^n can be realized by a stochastic flow network of size n that is optimal. Compared to the other stochastic systems, feedback (cycles in networks) strongly improves the expressibility of stochastic flow networks.

preprint2012arXiv

The Synthesis and Analysis of Stochastic Switching Circuits

Stochastic switching circuits are relay circuits that consist of stochastic switches called pswitches. The study of stochastic switching circuits has widespread applications in many fields of computer science, neuroscience, and biochemistry. In this paper, we discuss several properties of stochastic switching circuits, including robustness, expressibility, and probability approximation. First, we study the robustness, namely, the effect caused by introducing an error of size εto each pswitch in a stochastic circuit. We analyze two constructions and prove that simple series-parallel circuits are robust to small error perturbations, while general series-parallel circuits are not. Specifically, the total error introduced by perturbations of size less than εis bounded by a constant multiple of εin a simple series-parallel circuit, independent of the size of the circuit. Next, we study the expressibility of stochastic switching circuits: Given an integer q and a pswitch set S=\{\frac{1}{q},\frac{2}{q},...,\frac{q-1}{q}\}, can we synthesize any rational probability with denominator q^n (for arbitrary n) with a simple series-parallel stochastic switching circuit? We generalize previous results and prove that when q is a multiple of 2 or 3, the answer is yes. We also show that when q is a prime number larger than 3, the answer is no. Probability approximation is studied for a general case of an arbitrary pswitch set S=\{s_1,s_2,...,s_{|S|}\}. In this case, we propose an algorithm based on local optimization to approximate any desired probability. The analysis reveals that the approximation error of a switching circuit decreases exponentially with an increasing circuit size.

preprint2011arXiv

Compressed Encoding for Rank Modulation

Rank modulation has been recently proposed as a scheme for storing information in flash memories. While rank modulation has advantages in improving write speed and endurance, the current encoding approach is based on the "push to the top" operation that is not efficient in the general case. We propose a new encoding procedure where a cell level is raised to be higher than the minimal necessary subset - instead of all - of the other cell levels. This new procedure leads to a significantly more compressed (lower charge levels) encoding. We derive an upper bound for a family of codes that utilize the proposed encoding procedure, and consider code constructions that achieve that bound for several special cases.

preprint2011arXiv

Generalized Gray Codes for Local Rank Modulation

We consider the local rank-modulation scheme in which a sliding window going over a sequence of real-valued variables induces a sequence of permutations. Local rank-modulation is a generalization of the rank-modulation scheme, which has been recently suggested as a way of storing information in flash memory. We study Gray codes for the local rank-modulation scheme in order to simulate conventional multi-level flash cells while retaining the benefits of rank modulation. Unlike the limited scope of previous works, we consider code constructions for the entire range of parameters including the code length, sliding window size, and overlap between adjacent windows. We show our constructed codes have asymptotically-optimal rate. We also provide efficient encoding, decoding, and next-state algorithms.

preprint2011arXiv

Generating Probability Distributions using Multivalued Stochastic Relay Circuits

The problem of random number generation dates back to von Neumann's work in 1951. Since then, many algorithms have been developed for generating unbiased bits from complex correlated sources as well as for generating arbitrary distributions from unbiased bits. An equally interesting, but less studied aspect is the structural component of random number generation as opposed to the algorithmic aspect. That is, given a network structure imposed by nature or physical devices, how can we build networks that generate arbitrary probability distributions in an optimal way? In this paper, we study the generation of arbitrary probability distributions in multivalued relay circuits, a generalization in which relays can take on any of N states and the logical 'and' and 'or' are replaced with 'min' and 'max' respectively. Previous work was done on two-state relays. We generalize these results, describing a duality property and networks that generate arbitrary rational probability distributions. We prove that these networks are robust to errors and design a universal probability generator which takes input bits and outputs arbitrary binary probability distributions.

preprint2011arXiv

MDS Array Codes with Optimal Rebuilding

MDS array codes are widely used in storage systems to protect data against erasures. We address the \emph{rebuilding ratio} problem, namely, in the case of erasures, what is the the fraction of the remaining information that needs to be accessed in order to rebuild \emph{exactly} the lost information? It is clear that when the number of erasures equals the maximum number of erasures that an MDS code can correct then the rebuilding ratio is 1 (access all the remaining information). However, the interesting (and more practical) case is when the number of erasures is smaller than the erasure correcting capability of the code. For example, consider an MDS code that can correct two erasures: What is the smallest amount of information that one needs to access in order to correct a single erasure? Previous work showed that the rebuilding ratio is bounded between 1/2 and 3/4, however, the exact value was left as an open problem. In this paper, we solve this open problem and prove that for the case of a single erasure with a 2-erasure correcting code, the rebuilding ratio is 1/2. In general, we construct a new family of $r$-erasure correcting MDS array codes that has optimal rebuilding ratio of $\frac{1}{r}$ in the case of a single erasure. Our array codes have efficient encoding and decoding algorithms (for the case $r=2$ they use a finite field of size 3) and an optimal update property.

preprint2011arXiv

On Codes for Optimal Rebuilding Access

MDS (maximum distance separable) array codes are widely used in storage systems due to their computationally efficient encoding and decoding procedures. An MDS code with r redundancy nodes can correct any r erasures by accessing (reading) all the remaining information in both the systematic nodes and the parity (redundancy) nodes. However, in practice, a single erasure is the most likely failure event; hence, a natural question is how much information do we need to access in order to rebuild a single storage node? We define the rebuilding ratio as the fraction of remaining information accessed during the rebuilding of a single erasure. In our previous work we showed that the optimal rebuilding ratio of 1/r is achievable (using our newly constructed array codes) for the rebuilding of any systematic node, however, all the information needs to be accessed for the rebuilding of the parity nodes. Namely, constructing array codes with a rebuilding ratio of 1/r was left as an open problem. In this paper, we solve this open problem and present array codes that achieve the lower bound of 1/r for rebuilding any single systematic or parity node.

preprint2011arXiv

Zigzag Codes: MDS Array Codes with Optimal Rebuilding

MDS array codes are widely used in storage systems to protect data against erasures. We address the \emph{rebuilding ratio} problem, namely, in the case of erasures, what is the fraction of the remaining information that needs to be accessed in order to rebuild \emph{exactly} the lost information? It is clear that when the number of erasures equals the maximum number of erasures that an MDS code can correct then the rebuilding ratio is 1 (access all the remaining information). However, the interesting and more practical case is when the number of erasures is smaller than the erasure correcting capability of the code. For example, consider an MDS code that can correct two erasures: What is the smallest amount of information that one needs to access in order to correct a single erasure? Previous work showed that the rebuilding ratio is bounded between 1/2 and 3/4, however, the exact value was left as an open problem. In this paper, we solve this open problem and prove that for the case of a single erasure with a 2-erasure correcting code, the rebuilding ratio is 1/2. In general, we construct a new family of $r$-erasure correcting MDS array codes that has optimal rebuilding ratio of $\frac{e}{r}$ in the case of $e$ erasures, $1 \le e \le r$. Our array codes have efficient encoding and decoding algorithms (for the case $r=2$ they use a finite field of size 3) and an optimal update property.

preprint2010arXiv

Efficient Generation of Random Bits from Finite State Markov Chains

The problem of random number generation from an uncorrelated random source (of unknown probability distribution) dates back to von Neumann's 1951 work. Elias (1972) generalized von Neumann's scheme and showed how to achieve optimal efficiency in unbiased random bits generation. Hence, a natural question is what if the sources are correlated? Both Elias and Samuelson proposed methods for generating unbiased random bits in the case of correlated sources (of unknown probability distribution), specifically, they considered finite Markov chains. However, their proposed methods are not efficient or have implementation difficulties. Blum (1986) devised an algorithm for efficiently generating random bits from degree-2 finite Markov chains in expected linear time, however, his beautiful method is still far from optimality on information-efficiency. In this paper, we generalize Blum's algorithm to arbitrary degree finite Markov chains and combine it with Elias's method for efficient generation of unbiased bits. As a result, we provide the first known algorithm that generates unbiased random bits from an arbitrary finite Markov chain, operates in expected linear time and achieves the information-theoretic upper bound on efficiency.

preprint2010arXiv

Rebuilding for Array Codes in Distributed Storage Systems

In distributed storage systems that use coding, the issue of minimizing the communication required to rebuild a storage node after a failure arises. We consider the problem of repairing an erased node in a distributed storage system that uses an EVENODD code. EVENODD codes are maximum distance separable (MDS) array codes that are used to protect against erasures, and only require XOR operations for encoding and decoding. We show that when there are two redundancy nodes, to rebuild one erased systematic node, only 3/4 of the information needs to be transmitted. Interestingly, in many cases, the required disk I/O is also minimized.

preprint2010arXiv

Trajectory Codes for Flash Memory

Flash memory is well-known for its inherent asymmetry: the flash-cell charge levels are easy to increase but are hard to decrease. In a general rewriting model, the stored data changes its value with certain patterns. The patterns of data updates are determined by the data structure and the application, and are independent of the constraints imposed by the storage medium. Thus, an appropriate coding scheme is needed so that the data changes can be updated and stored efficiently under the storage-medium's constraints. In this paper, we define the general rewriting problem using a graph model. It extends many known rewriting models such as floating codes, WOM codes, buffer codes, etc. We present a new rewriting scheme for flash memories, called the trajectory code, for rewriting the stored data as many times as possible without block erasures. We prove that the trajectory code is asymptotically optimal in a wide range of scenarios. We also present randomized rewriting codes optimized for expected performance (given arbitrary rewriting sequences). Our rewriting codes are shown to be asymptotically optimal.

Jehoshua Bruck

What is connected

Connect this record

See the researcher in context

Building this map preview

36 published item(s)

Correcting $k$ Deletions and Insertions in Racetrack Memory

On Algebraic Constructions of Neural Networks with Small Weights

Coding for Optimized Writing Rate in DNA Storage

CodNN -- Robust Neural Networks From Coded Classification

What is the Value of Data? On Mathematical Methods for Data Quality Estimation

Communication Efficient Secret Sharing

Duplication Distance to the Root for Binary Sequences

Duplication-Correcting Codes for Data Storage in the DNA of Living Organisms

Optimal Rebuilding of Multiple Erasures in MDS Codes

Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes

Capacity and Expressiveness of Genomic Tandem Duplication

Rewriting Flash Memories by Message Passing

Explicit MDS Codes for Optimal Repair Bandwidth

Rank-Modulation Rewrite Coding for Flash Memories

Rate-Distortion for Ranking with Incomplete Information

Systematic Codes for Rank Modulation

The Capacity of String-Replication Systems

Access vs. Bandwidth in Codes for Storage

Systematic Error-Correcting Codes for Rank Modulation

A Universal Scheme for Transforming Binary Algorithms to Generate Random Bits from Loaded Dice

Balanced Modulation for Nonvolatile Memories

Efficiently Extracting Randomness from Imperfect Stochastic Processes

Linear Transformations for Randomness Extraction

Nonuniform Codes for Correcting Asymmetric Errors in Data Storage

Streaming Algorithms for Optimal Generation of Random Bits

Synthesis of Stochastic Flow Networks

The Synthesis and Analysis of Stochastic Switching Circuits

Compressed Encoding for Rank Modulation

Generalized Gray Codes for Local Rank Modulation

Generating Probability Distributions using Multivalued Stochastic Relay Circuits

MDS Array Codes with Optimal Rebuilding

On Codes for Optimal Rebuilding Access

Zigzag Codes: MDS Array Codes with Optimal Rebuilding

Efficient Generation of Random Bits from Finite State Markov Chains

Rebuilding for Array Codes in Distributed Storage Systems

Trajectory Codes for Flash Memory