Source author record

Arya Mazumdar

Arya Mazumdar appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Machine Learning Data Structures and Algorithms Discrete Mathematics Distributed, Parallel, and Cluster Computing eess.SP Cryptography and Security Databases Information Retrieval math.PR math.ST Networking and Internet Architecture Statistics Theory

Catalog footprint

What is connected

33works

14topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Approximate Distributed Coded Computing: Polynomial Codes and Randomized Sketching

Coded computing is a distributed paradigm that uses coding theory to introduce \textit{redundancy} and overcome bottlenecks in large-scale systems. In the same vein, randomized numerical linear algebra employs probabilistic methods to \textit{compress} and accelerate linear algebraic operations, addressing challenges in high-dimensional data analysis. This article reviews the foundations of both fields and presents distributed schemes that combine techniques from both to speed up optimization and machine learning algorithms, in the presence of slow or non-responsive servers. Along the way, we touch on various related topics and mathematical concepts.

preprint2026arXiv

Binary Iterative Hard Thresholding Converges with Optimal Number of Measurements for 1-Bit Compressed Sensing

Compressed sensing has been a very successful high-dimensional signal acquisition and recovery technique that relies on linear operations. However, the actual measurements of signals have to be quantized before storing or processing. 1(One)-bit compressed sensing is a heavily quantized version of compressed sensing, where each linear measurement of a signal is reduced to just one bit: the sign of the measurement. Once enough of such measurements are collected, the recovery problem in 1-bit compressed sensing aims to find the original signal with as much accuracy as possible. The recovery problem is related to the traditional "halfspace-learning" problem in learning theory. For recovery of sparse vectors, a popular reconstruction method from 1-bit measurements is the binary iterative hard thresholding (BIHT) algorithm. The algorithm is a simple projected sub-gradient descent method, and is known to converge well empirically, despite the nonconvexity of the problem. The convergence property of BIHT was not theoretically justified, except with an exorbitantly large number of measurements (i.e., a number of measurement greater than $\max\{k^{10}, 24^{48}, k^{3.5}/ε\}$, where $k$ is the sparsity, $ε$ denotes the approximation error, and even this expression hides other factors). In this paper we show that the BIHT algorithm converges with only $\tilde{O}(\frac{k}ε)$ measurements. Note that, this dependence on $k$ and $ε$ is optimal for any recovery method in 1-bit compressed sensing. With this result, to the best of our knowledge, BIHT is the only practical and efficient (polynomial time) algorithm that requires the optimal number of measurements in all parameters (both $k$ and $ε$). This is also an example of a gradient descent algorithm converging to the correct solution for a nonconvex problem, under suitable structural conditions.

preprint2022arXiv

Decentralized Competing Bandits in Non-Stationary Matching Markets

Understanding complex dynamics of two-sided online matching markets, where the demand-side agents compete to match with the supply-side (arms), has recently received substantial interest. To that end, in this paper, we introduce the framework of decentralized two-sided matching market under non stationary (dynamic) environments. We adhere to the serial dictatorship setting, where the demand-side agents have unknown and different preferences over the supply-side (arms), but the arms have fixed and known preference over the agents. We propose and analyze a decentralized and asynchronous learning algorithm, namely Decentralized Non-stationary Competing Bandits (\texttt{DNCB}), where the agents play (restrictive) successive elimination type learning algorithms to learn their preference over the arms. The complexity in understanding such a system stems from the fact that the competing bandits choose their actions in an asynchronous fashion, and the lower ranked agents only get to learn from a set of arms, not \emph{dominated} by the higher ranked agents, which leads to \emph{forced exploration}. With carefully defined complexity parameters, we characterize this \emph{forced exploration} and obtain sub-linear (logarithmic) regret of \texttt{DNCB}. Furthermore, we validate our theoretical findings via experiments.

preprint2022arXiv

Lower Bounds on the Total Variation Distance Between Mixtures of Two Gaussians

Mixtures of high dimensional Gaussian distributions have been studied extensively in statistics and learning theory. While the total variation distance appears naturally in the sample complexity of distribution learning, it is analytically difficult to obtain tight lower bounds for mixtures. Exploiting a connection between total variation distance and the characteristic function of the mixture, we provide fairly tight functional approximations. This enables us to derive new lower bounds on the total variation distance between pairs of two-component Gaussian mixtures that have a shared covariance matrix.

preprint2022arXiv

On Learning Mixture of Linear Regressions in the Non-Realizable Setting

While mixture of linear regressions (MLR) is a well-studied topic, prior works usually do not analyze such models for prediction error. In fact, {\em prediction} and {\em loss} are not well-defined in the context of mixtures. In this paper, first we show that MLR can be used for prediction where instead of predicting a label, the model predicts a list of values (also known as {\em list-decoding}). The list size is equal to the number of components in the mixture, and the loss function is defined to be minimum among the losses resulted by all the component models. We show that with this definition, a solution of the empirical risk minimization (ERM) achieves small probability of prediction error. This begs for an algorithm to minimize the empirical risk for MLR, which is known to be computationally hard. Prior algorithmic works in MLR focus on the {\em realizable} setting, i.e., recovery of parameters when data is probabilistically generated by a mixed linear (noisy) model. In this paper we show that a version of the popular alternating minimization (AM) algorithm finds the best fit lines in a dataset even when a realizable model is not assumed, under some regularity conditions on the dataset and the initial points, and thereby provides a solution for the ERM. We further provide an algorithm that runs in polynomial time in the number of datapoints, and recovers a good approximation of the best fit lines. The two algorithms are experimentally compared.

preprint2022arXiv

Support Recovery in Mixture Models with Sparse Parameters

Mixture models are widely used to fit complex and multimodal datasets. In this paper we study mixtures with high dimensional sparse latent parameter vectors and consider the problem of support recovery of those vectors. While parameter learning in mixture models is well-studied, the sparsity constraint remains relatively unexplored. Sparsity of parameter vectors is a natural constraint in variety of settings, and support recovery is a major step towards parameter estimation. We provide efficient algorithms for support recovery that have a logarithmic sample complexity dependence on the dimensionality of the latent space. Our algorithms are quite general, namely they are applicable to 1) mixtures of many different canonical distributions including Uniform, Poisson, Laplace, Gaussians, etc. 2) Mixtures of linear regressions and linear classifiers with Gaussian covariates under different assumptions on the unknown parameters. In most of these settings, our results are the first guarantees on the problem while in the rest, our results provide improvements on existing works.

preprint2020arXiv

A workload-adaptive mechanism for linear queries under local differential privacy

We propose a new mechanism to accurately answer a user-provided set of linear counting queries under local differential privacy (LDP). Given a set of linear counting queries (the workload) our mechanism automatically adapts to provide accuracy on the workload queries. We define a parametric class of mechanisms that produce unbiased estimates of the workload, and formulate a constrained optimization problem to select a mechanism from this class that minimizes expected total squared error. We solve this optimization problem numerically using projected gradient descent and provide an efficient implementation that scales to large workloads. We demonstrate the effectiveness of our optimization-based approach in a wide variety of settings, showing that it outperforms many competitors, even outperforming existing mechanisms on the workloads for which they were intended.

preprint2020arXiv

Algebraic and Analytic Approaches for Parameter Learning in Mixture Models

We present two different approaches for parameter learning in several mixture models in one dimension. Our first approach uses complex-analytic methods and applies to Gaussian mixtures with shared variance, binomial mixtures with shared success probability, and Poisson mixtures, among others. An example result is that $\exp(O(N^{1/3}))$ samples suffice to exactly learn a mixture of $k<N$ Poisson distributions, each with integral rate parameters bounded by $N$. Our second approach uses algebraic and combinatorial tools and applies to binomial mixtures with shared trial parameter $N$ and differing success parameters, as well as to mixtures of geometric distributions. Again, as an example, for binomial mixtures with $k$ components and success parameters discretized to resolution $ε$, $O(k^2(N/ε)^{8/\sqrtε})$ samples suffice to exactly recover the parameters. For some of these distributions, our results represent the first guarantees for parameter estimation.

preprint2020arXiv

Connectivity in Random Annulus Graphs and the Geometric Block Model

We provide new connectivity results for {\em vertex-random graphs} or {\em random annulus graphs} which are significant generalizations of random geometric graphs. Random geometric graphs (RGG) are one of the most basic models of random graphs for spatial networks proposed by Gilbert in 1961, shortly after the introduction of the Erdős-R\'{en}yi random graphs. They resemble social networks in many ways (e.g. by spontaneously creating cluster of nodes with high modularity). The connectivity properties of RGG have been studied since its introduction, and analyzing them has been significantly harder than their Erdős-R\'{en}yi counterparts due to correlated edge formation. Our next contribution is in using the connectivity of random annulus graphs to provide necessary and sufficient conditions for efficient recovery of communities for {\em the geometric block model} (GBM). The GBM is a probabilistic model for community detection defined over an RGG in a similar spirit as the popular {\em stochastic block model}, which is defined over an Erdős-R\'{en}yi random graph. The geometric block model inherits the transitivity properties of RGGs and thus models communities better than a stochastic block model. However, analyzing them requires fresh perspectives as all prior tools fail due to correlation in edge formation. We provide a simple and efficient algorithm that can recover communities in GBM exactly with high probability in the regime of connectivity.

preprint2020arXiv

Recovery of Sparse Signals from a Mixture of Linear Samples

Mixture of linear regressions is a popular learning theoretic model that is used widely to represent heterogeneous data. In the simplest form, this model assumes that the labels are generated from either of two different linear models and mixed together. Recent works of Yin et al. and Krishnamurthy et al., 2019, focus on an experimental design setting of model recovery for this problem. It is assumed that the features can be designed and queried with to obtain their label. When queried, an oracle randomly selects one of the two different sparse linear models and generates a label accordingly. How many such oracle queries are needed to recover both of the models simultaneously? This question can also be thought of as a generalization of the well-known compressed sensing problem (Candès and Tao, 2005, Donoho, 2006). In this work, we address this query complexity problem and provide efficient algorithms that improves on the previously best known results.

preprint2020arXiv

Reliable Distributed Clustering with Redundant Data Assignment

In this paper, we present distributed generalized clustering algorithms that can handle large scale data across multiple machines in spite of straggling or unreliable machines. We propose a novel data assignment scheme that enables us to obtain global information about the entire data even when some machines fail to respond with the results of the assigned local computations. The assignment scheme leads to distributed algorithms with good approximation guarantees for a variety of clustering and dimensionality reduction problems.

preprint2020arXiv

Storage Capacity of Repairable Networks

In this paper, we introduce a model of a distributed storage system that is locally recoverable from any single server failure. Unlike the usual local recovery model of codes for distributed storage, this model accounts for the fact that each server or storage node in a network is connectible to only some, and not all other, nodes. This may happen for reasons such as physical separation, inhomogeneity in storage platforms etc. We estimate the storage capacity of both undirected and directed networks under this model and propose some constructive schemes. From a coding theory point of view, we show that this model is approximately dual of the well-studied index coding problem. Further in this paper, we extend the above model to handle multiple server failures. Among other results, we provide an upper bound on the minimum pairwise distance of a set of words that can be stored in a graph with the local repair guarantee. The well-known impossibility bounds on the distance of locally recoverable codes follow from our result.

preprint2016arXiv

Associative Memory using Dictionary Learning and Expander Decoding

An associative memory is a framework of content-addressable memory that stores a collection of message vectors (or a dataset) over a neural network while enabling a neurally feasible mechanism to recover any message in the dataset from its noisy version. Designing an associative memory requires addressing two main tasks: 1) learning phase: given a dataset, learn a concise representation of the dataset in the form of a graphical model (or a neural network), 2) recall phase: given a noisy version of a message vector from the dataset, output the correct message vector via a neurally feasible algorithm over the network learnt during the learning phase. This paper studies the problem of designing a class of neural associative memories which learns a network representation for a large dataset that ensures correction against a large number of adversarial errors during the recall phase. Specifically, the associative memories designed in this paper can store dataset containing $\exp(n)$ $n$-length message vectors over a network with $O(n)$ nodes and can tolerate $Ω(\frac{n}{{\rm polylog} n})$ adversarial errors. This paper carries out this memory design by mapping the learning phase and recall phase to the tasks of dictionary learning with a square dictionary and iterative error correction in an expander code, respectively.

preprint2016arXiv

Bounds on the Rate of Linear Locally Repairable Codes over Small Alphabets

Locally repairable codes (LRC) have recently been a subject of intense research due to theoretical appeal and their application in distributed storage systems. In an LRC, any coordinate of a codeword can be recovered by accessing only few other coordinates. For LRCs over small alphabet (such as binary), the optimal rate-distance trade-off is unknown. In this paper we provide the tightest known upper bound on the rate of linear LRCs of a given relative distance, an improvement over any previous result, in particular \cite{cadambe2013upper}.

preprint2016arXiv

Clustering Via Crowdsourcing

In recent years, crowdsourcing, aka human aided computation has emerged as an effective platform for solving problems that are considered complex for machines alone. Using human is time-consuming and costly due to monetary compensations. Therefore, a crowd based algorithm must judiciously use any information computed through an automated process, and ask minimum number of questions to the crowd adaptively. One such problem which has received significant attention is {\em entity resolution}. Formally, we are given a graph $G=(V,E)$ with unknown edge set $E$ where $G$ is a union of $k$ (again unknown, but typically large $O(n^α)$, for $α>0$) disjoint cliques $G_i(V_i, E_i)$, $i =1, \dots, k$. The goal is to retrieve the sets $V_i$s by making minimum number of pair-wise queries $V \times V\to\{\pm1\}$ to an oracle (the crowd). When the answer to each query is correct, e.g. via resampling, then this reduces to finding connected components in a graph. On the other hand, when crowd answers may be incorrect, it corresponds to clustering over minimum number of noisy inputs. Even, with perfect answers, a simple lower and upper bound of $Θ(nk)$ on query complexity can be shown. A major contribution of this paper is to reduce the query complexity to linear or even sublinear in $n$ when mild side information is provided by a machine, and even in presence of crowd errors which are not correctable via resampling. We develop new information theoretic lower bounds on the query complexity of clustering with side information and errors, and our upper bounds closely match with them. Our algorithms are naturally parallelizable, and also give near-optimal bounds on the number of adaptive rounds required to match the query complexity.

preprint2016arXiv

Cooperative Local Repair in Distributed Storage

Erasure-correcting codes, that support local repair of codeword symbols, have attracted substantial attention recently for their application in distributed storage systems. This paper investigates a generalization of the usual locally repairable codes. In particular, this paper studies a class of codes with the following property: any small set of codeword symbols can be reconstructed (repaired) from a small number of other symbols. This is referred to as cooperative local repair. The main contribution of this paper is bounds on the trade-off of the minimum distance and the dimension of such codes, as well as explicit constructions of families of codes that enable cooperative local repair. Some other results regarding cooperative local repair are also presented, including an analysis for the well-known Hadamard/Simplex codes.

preprint2016arXiv

Local Partial Clique Covers for Index Coding

Index coding, or broadcasting with side information, is a network coding problem of most fundamental importance. In this problem, given a directed graph, each vertex represents a user with a need of information, and the neighborhood of each vertex represents the side information availability to that user. The aim is to find an encoding to minimum number of bits (optimal rate) that, when broadcasted, will be sufficient to the need of every user. Not only the optimal rate is intractable, but it is also very hard to characterize with some other well-studied graph parameter or with a simpler formulation, such as a linear program. Recently there have been a series of works that address this question and provide explicit schemes for index coding as the optimal value of a linear program with rate given by well-studied properties such as local chromatic number or partial clique-covering number. There has been a recent attempt to combine these existing notions of local chromatic number and partial clique covering into a unified notion denoted as the local partial clique cover (Arbabjolfaei and Kim, 2014). We present a generalized novel upper-bound (encoding scheme) - in the form of the minimum value of a linear program - for optimal index coding. Our bound also combines the notions of local chromatic number and partial clique covering into a new definition of the local partial clique cover, which outperforms both the previous bounds, as well as beats the previous attempt to combination. Further, we look at the upper bound derived recently by Thapa et al., 2015, and extend their $n$-$\mathsf{GIC}$ (Generalized Interlinked Cycle) construction to $(k,n)$-$\mathsf{GIC}$ graphs, which are a generalization of $k$-partial cliques.

preprint2016arXiv

Nonadaptive group testing with random set of defectives

In a group testing scheme, a set of tests is designed to identify a small number $t$ of defective items that are present among a large number $N$ of items. Each test takes as input a group of items and produces a binary output indicating whether any defective item is present in the group. In a non-adaptive scheme designing a testing scheme is equivalent to the construction of a disjunct matrix, an $M \times N$ binary matrix where the union of supports of any $t$ columns does not contain the support of any other column. In this paper we consider the scenario where defective items are random and follow simple probability distributions. In particular we consider the cases where 1) each item can be defective independently with probability $\frac{t}{N}$ and 2) each $t$-set of items can be defective with uniform probability. In both cases our aim is to design a testing matrix that successfully identifies the set of defectives with high probability. Both of these models have been studied in the literature before and it is known that $O(t\log N)$ tests are necessary as well as sufficient (via random coding) in both cases. Our main focus is explicit deterministic construction of the test matrices amenable to above scenarios. One of the most popular ways of constructing test matrices relies on \emph{constant-weight error-correcting codes} and their minimum distance. We go beyond the minimum distance analysis and connect the average distance of a constant weight code to the parameters of the resulting test matrix. With our relaxed requirements, we show that using explicit constant-weight codes (e.g., based on algebraic geometry codes) we may achieve a number of tests equal to $O(t \frac{\log^2 N}{ \log t})$ for both the first and the second cases.

preprint2016arXiv

Security in Locally Repairable Storage

In this paper we extend the notion of {\em locally repairable} codes to {\em secret sharing} schemes. The main problem that we consider is to find optimal ways to distribute shares of a secret among a set of storage-nodes (participants) such that the content of each node (share) can be recovered by using contents of only few other nodes, and at the same time the secret can be reconstructed by only some allowable subsets of nodes. As a special case, an eavesdropper observing some set of specific nodes (such as less than certain number of nodes) does not get any information. In other words, we propose to study a locally repairable distributed storage system that is secure against a {\em passive eavesdropper} that can observe some subsets of nodes. We provide a number of results related to such systems including upper-bounds and achievability results on the number of bits that can be securely stored with these constraints.

preprint2015arXiv

An Upper Bound On the Size of Locally Recoverable Codes

In a {\em locally recoverable} or {\em repairable} code, any symbol of a codeword can be recovered by reading only a small (constant) number of other symbols. The notion of local recoverability is important in the area of distributed storage where a most frequent error-event is a single storage node failure (erasure). A common objective is to repair the node by downloading data from as few other storage node as possible. In this paper, we bound the minimum distance of a code in terms of its length, size and locality. Unlike previous bounds, our bound follows from a significantly simple analysis and depends on the size of the alphabet being used. It turns out that the binary Simplex codes satisfy our bound with equality; hence the Simplex codes are the first example of a optimal binary locally repairable code family. We also provide achievability results based on random coding and concatenated codes that are numerically verified to be close to our bounds.

preprint2015arXiv

Compression in the Space of Permutations

We investigate lossy compression (source coding) of data in the form of permutations. This problem has direct applications in the storage of ordinal data or rankings, and in the analysis of sorting algorithms. We analyze the rate-distortion characteristic for the permutation space under the uniform distribution, and the minimum achievable rate of compression that allows a bounded distortion after recovery. Our analysis is with respect to different practical and useful distortion measures, including Kendall-tau distance, Spearman's footrule, Chebyshev distance and inversion-$\ell_1$ distance. We establish equivalence of source code designs under certain distortions and show simple explicit code designs that incur low encoding/decoding complexities and are asymptotically optimal. Finally, we show that for the Mallows model, a popular nonuniform ranking model on the permutation space, both the entropy and the maximum distortion at zero rate are much lower than the uniform counterparts, which motivates the future design of efficient compression schemes for this model.

preprint2015arXiv

Restricted isometry property of random subdictionaries

We study statistical restricted isometry, a property closely related to sparse signal recovery, of deterministic sensing matrices of size $m \times N$. A matrix is said to have a statistical restricted isometry property (StRIP) of order $k$ if most submatrices with $k$ columns define a near-isometric map of ${\mathbb R}^k$ into ${\mathbb R}^m$. As our main result, we establish sufficient conditions for the StRIP property of a matrix in terms of the mutual coherence and mean square coherence. We show that for many existing deterministic families of sampling matrices, $m=O(k)$ rows suffice for $k$-StRIP, which is an improvement over the known estimates of either $m = Θ(k \log N)$ or $m = Θ(k\log k)$. We also give examples of matrix families that are shown to have the StRIP property using our sufficient conditions.

preprint2014arXiv

On a Duality Between Recoverable Distributed Storage and Index Coding

In this paper, we introduce a model of a single-failure locally recoverable distributed storage system. This model appears to give rise to a problem seemingly dual of the well-studied index coding problem. The relation between the dimensions of an optimal index code and optimal distributed storage code of our model has been established in this paper. We also show some extensions to vector codes.

preprint2014arXiv

On the Capacity of Memoryless Adversary

In this paper, we study a model of communication under adversarial noise. In this model, the adversary makes online decisions on whether to corrupt a transmitted bit based on only the value of that bit. Like the usual binary symmetric channel of information theory or the fully adversarial channel of combinatorial coding theory, the adversary can, with high probability, introduce at most a given fraction of error. It is shown that, the capacity (maximum rate of reliable information transfer) of such memoryless adversary is strictly below that of the binary symmetric channel. We give new upper bound on the capacity of such channel -- the tightness of this upper bound remains an open question. The main component of our proof is the careful examination of error-correcting properties of a code with skewed distance distribution.

preprint2013arXiv

Random Subdictionaries and Coherence Conditions for Sparse Signal Recovery

The most frequently used condition for sampling matrices employed in compressive sampling is the restricted isometry (RIP) property of the matrix when restricted to sparse signals. At the same time, imposing this condition makes it difficult to find explicit matrices that support recovery of signals from sketches of the optimal (smallest possible)dimension. A number of attempts have been made to relax or replace the RIP property in sparse recovery algorithms. We focus on the relaxation under which the near-isometry property holds for most rather than for all submatrices of the sampling matrix, known as statistical RIP or StRIP condition. We show that sampling matrices of dimensions $m\times N$ with maximum coherence $μ=O((k\log^3 N)^{-1/4})$ and mean square coherence $\bar μ^2=O(1/(k\log N))$ support stable recovery of $k$-sparse signals using Basis Pursuit. These assumptions are satisfied in many examples. As a result, we are able to construct sampling matrices that support recovery with low error for sparsity $k$ higher than $\sqrt m,$ which exceeds the range of parameters of the known classes of RIP matrices.

preprint2013arXiv

Update-Efficiency and Local Repairability Limits for Capacity Approaching Codes

Motivated by distributed storage applications, we investigate the degree to which capacity achieving encodings can be efficiently updated when a single information bit changes, and the degree to which such encodings can be efficiently (i.e., locally) repaired when single encoded bit is lost. Specifically, we first develop conditions under which optimum error-correction and update-efficiency are possible, and establish that the number of encoded bits that must change in response to a change in a single information bit must scale logarithmically in the block-length of the code if we are to achieve any nontrivial rate with vanishing probability of error over the binary erasure or binary symmetric channels. Moreover, we show there exist capacity-achieving codes with this scaling. With respect to local repairability, we develop tight upper and lower bounds on the number of remaining encoded bits that are needed to recover a single lost bit of the encoding. In particular, we show that if the code-rate is $ε$ less than the capacity, then for optimal codes, the maximum number of codeword symbols required to recover one lost symbol must scale as $\log1/ε$. Several variations on---and extensions of---these results are also developed.

preprint2012arXiv

Construction of Almost Disjunct Matrices for Group Testing

In a \emph{group testing} scheme, a set of tests is designed to identify a small number $t$ of defective items among a large set (of size $N$) of items. In the non-adaptive scenario the set of tests has to be designed in one-shot. In this setting, designing a testing scheme is equivalent to the construction of a \emph{disjunct matrix}, an $M \times N$ matrix where the union of supports of any $t$ columns does not contain the support of any other column. In principle, one wants to have such a matrix with minimum possible number $M$ of rows (tests). One of the main ways of constructing disjunct matrices relies on \emph{constant weight error-correcting codes} and their \emph{minimum distance}. In this paper, we consider a relaxed definition of a disjunct matrix known as \emph{almost disjunct matrix}. This concept is also studied under the name of \emph{weakly separated design} in the literature. The relaxed definition allows one to come up with group testing schemes where a close-to-one fraction of all possible sets of defective items are identifiable. Our main contribution is twofold. First, we go beyond the minimum distance analysis and connect the \emph{average distance} of a constant weight code to the parameters of an almost disjunct matrix constructed from it. Our second contribution is to explicitly construct almost disjunct matrices based on our average distance analysis, that have much smaller number of rows than any previous explicit construction of disjunct matrices. The parameters of our construction can be varied to cover a large range of relations for $t$ and $N$.

preprint2011arXiv

Constructions of Rank Modulation Codes

Rank modulation is a way of encoding information to correct errors in flash memory devices as well as impulse noise in transmission lines. Modeling rank modulation involves construction of packings of the space of permutations equipped with the Kendall tau distance. We present several general constructions of codes in permutations that cover a broad range of code parameters. In particular, we show a number of ways in which conventional error-correcting codes can be modified to correct errors in the Kendall space. Codes that we construct afford simple encoding and decoding algorithms of essentially the same complexity as required to correct errors in the Hamming metric. For instance, from binary BCH codes we obtain codes correcting $t$ Kendall errors in $n$ memory cells that support the order of $n!/(\log_2n!)^t$ messages, for any constant $t= 1,2,...$ We also construct families of codes that correct a number of errors that grows with $n$ at varying rates, from $Θ(n)$ to $Θ(n^{2})$. One of our constructions gives rise to a family of rank modulation codes for which the trade-off between the number of messages and the number of correctable Kendall errors approaches the optimal scaling rate. Finally, we list a number of possibilities for constructing codes of finite length, and give examples of rank modulation codes with specific parameters.

preprint2011arXiv

On the Number of Errors Correctable with Codes on Graphs

We study ensembles of codes on graphs (generalized low-density parity-check, or LDPC codes) constructed from random graphs and fixed local constrained codes, and their extension to codes on hypergraphs. It is known that the average minimum distance of codes in these ensembles grows linearly with the code length. We show that these codes can correct a linearly growing number of errors under simple iterative decoding algorithms. In particular, we show that this property extends to codes constructed by parallel concatenation of Hamming codes and other codes with small minimum distance. Previously known results that proved this property for graph codes relied on graph expansion and required the choice of local codes with large distance relative to their length.

preprint2010arXiv

Codes in Permutations and Error Correction for Rank Modulation

Codes for rank modulation have been recently proposed as a means of protecting flash memory devices from errors. We study basic coding theoretic problems for such codes, representing them as subsets of the set of permutations of $n$ elements equipped with the Kendall tau distance. We derive several lower and upper bounds on the size of codes. These bounds enable us to establish the exact scaling of the size of optimal codes for large values of $n$. We also show the existence of codes whose size is within a constant factor of the sphere packing bound for any fixed number of errors.

preprint2010arXiv

Coding for High-Density Recording on a 1-D Granular Magnetic Medium

In terabit-density magnetic recording, several bits of data can be replaced by the values of their neighbors in the storage medium. As a result, errors in the medium are dependent on each other and also on the data written. We consider a simple one-dimensional combinatorial model of this medium. In our model, we assume a setting where binary data is sequentially written on the medium and a bit can erroneously change to the immediately preceding value. We derive several properties of codes that correct this type of errors, focusing on bounds on their cardinality. We also define a probabilistic finite-state channel model of the storage medium, and derive lower and upper estimates of its capacity. A lower bound is derived by evaluating the symmetric capacity of the channel, i.e., the maximum transmission rate under the assumption of the uniform input distribution of the channel. An upper bound is found by showing that the original channel is a stochastic degradation of another, related channel model whose capacity we can compute explicitly.

preprint2009arXiv

On linear balancing sets

Let n be an even positive integer and F be the field \GF(2). A word in F^n is called balanced if its Hamming weight is n/2. A subset C \subseteq F^n$ is called a balancing set if for every word y \in F^n there is a word x \in C such that y + x is balanced. It is shown that most linear subspaces of F^n of dimension slightly larger than 3/2\log_2(n) are balancing sets. A generalization of this result to linear subspaces that are "almost balancing" is also presented. On the other hand, it is shown that the problem of deciding whether a given set of vectors in F^n spans a balancing set, is NP-hard. An application of linear balancing sets is presented for designing efficient error-correcting coding schemes in which the codewords are balanced.

preprint2005arXiv

Construction of Turbo Code Interleavers from 3-Regular Hamiltonian Graphs

In this letter we present a new construction of interleavers for turbo codes from 3-regular Hamiltonian graphs. The interleavers can be generated using a few parameters, which can be selected in such a way that the girth of the interleaver graph (IG) becomes large, inducing a high summary distance. The size of the search space for these parameters is derived. The proposed interleavers themselves work as their de-interleavers.

Arya Mazumdar

What is connected

Connect this record

See the researcher in context

Building this map preview

33 published item(s)

Approximate Distributed Coded Computing: Polynomial Codes and Randomized Sketching

Binary Iterative Hard Thresholding Converges with Optimal Number of Measurements for 1-Bit Compressed Sensing

Decentralized Competing Bandits in Non-Stationary Matching Markets

Lower Bounds on the Total Variation Distance Between Mixtures of Two Gaussians

On Learning Mixture of Linear Regressions in the Non-Realizable Setting

Support Recovery in Mixture Models with Sparse Parameters

A workload-adaptive mechanism for linear queries under local differential privacy

Algebraic and Analytic Approaches for Parameter Learning in Mixture Models

Connectivity in Random Annulus Graphs and the Geometric Block Model

Recovery of Sparse Signals from a Mixture of Linear Samples

Reliable Distributed Clustering with Redundant Data Assignment

Storage Capacity of Repairable Networks

Associative Memory using Dictionary Learning and Expander Decoding

Bounds on the Rate of Linear Locally Repairable Codes over Small Alphabets

Clustering Via Crowdsourcing

Cooperative Local Repair in Distributed Storage

Local Partial Clique Covers for Index Coding

Nonadaptive group testing with random set of defectives

Security in Locally Repairable Storage

An Upper Bound On the Size of Locally Recoverable Codes

Compression in the Space of Permutations

Restricted isometry property of random subdictionaries

On a Duality Between Recoverable Distributed Storage and Index Coding

On the Capacity of Memoryless Adversary

Random Subdictionaries and Coherence Conditions for Sparse Signal Recovery

Update-Efficiency and Local Repairability Limits for Capacity Approaching Codes

Construction of Almost Disjunct Matrices for Group Testing

Constructions of Rank Modulation Codes

On the Number of Errors Correctable with Codes on Graphs

Codes in Permutations and Error Correction for Rank Modulation

Coding for High-Density Recording on a 1-D Granular Magnetic Medium

On linear balancing sets

Construction of Turbo Code Interleavers from 3-Regular Hamiltonian Graphs