Source author record

Anindya De

Anindya De appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computational Complexity Data Structures and Algorithms math.PR Machine Learning Computer Science and Game Theory Cryptography and Security quant-ph Discrete Mathematics math.ST Statistics Theory

Catalog footprint

What is connected

22works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Reconstructing Ultrametric Trees from Noisy Experiments

The problem of reconstructing evolutionary trees or phylogenies is of great interest in computational biology. A popular model for this problem assumes that we are given the set of leaves (current species) of an unknown binary tree and the results of `experiments' on triples of leaves (a,b,c), which return the pair with the deepest least common ancestor. If the tree is assumed to be an ultrametric (i.e., all root-leaf paths have the same length), the experiment can be equivalently seen to return the closest pair of leaves. In this model, efficient algorithms are known for tree reconstruction. In reality, since the data on which these `experiments' are run is itself generated by the stochastic process of evolution, these experiments are noisy. In all reasonable models of evolution, if the branches leading to the leaves in a triple separate from each other at common ancestors that are very close to each other in the tree, the result of the experiment should be close to uniformly random. Motivated by this, we consider a model where the noise on any triple is just dependent on the three pairwise distances (referred to as distance based noise). Our results are the following: 1. Suppose the length of every edge in the unknown tree is at least $\tilde{O}(\frac{1}{\sqrt n})$ fraction of the length of a root-leaf path. Then, we give an efficient algorithm to reconstruct the topology of the tree for a broad family of distance-based noise models. Further, we show that if the edges are asymptotically shorter, then topology reconstruction is information-theoretically impossible. 2. Further, for a specific distance-based noise model--which we refer to as the homogeneous noise model--we show that the edge weights can also be approximately reconstructed under the same quantitative lower bound on the edge lengths.

preprint2021arXiv

Learning a mixture of two subspaces over finite fields

We study the problem of learning a mixture of two subspaces over $\mathbb{F}_2^n$. The goal is to recover the individual subspaces, given samples from a (weighted) mixture of samples drawn uniformly from the two subspaces $A_0$ and $A_1$. This problem is computationally challenging, as it captures the notorious problem of "learning parities with noise" in the degenerate setting when $A_1 \subseteq A_0$. This is in contrast to the analogous problem over the reals that can be solved in polynomial time (Vidal'03). This leads to the following natural question: is Learning Parities with Noise the only computational barrier in obtaining efficient algorithms for learning mixtures of subspaces over $\mathbb{F}_2^n$? The main result of this paper is an affirmative answer to the above question. Namely, we show the following results: 1. When the subspaces $A_0$ and $A_1$ are incomparable, i.e., $A_0$ and $A_1$ are not contained inside each other, then there is a polynomial time algorithm to recover the subspaces $A_0$ and $A_1$. 2. In the case when $A_1$ is a subspace of $A_0$ with a significant gap in the dimension i.e., $dim(A_1) \le αdim(A_0)$ for $α<1$, there is a $n^{O(1/(1-α))}$ time algorithm to recover the subspaces $A_0$ and $A_1$. Thus, our algorithms imply computational tractability of the problem of learning mixtures of two subspaces, except in the degenerate setting captured by learning parities with noise.

preprint2021arXiv

Robust testing of low-dimensional functions

A natural problem in high-dimensional inference is to decide if a classifier $f:\mathbb{R}^n \rightarrow \{-1,1\}$ depends on a small number of linear directions of its input data. Call a function $g: \mathbb{R}^n \rightarrow \{-1,1\}$, a linear $k$-junta if it is completely determined by some $k$-dimensional subspace of the input space. A recent work of the authors showed that linear $k$-juntas are testable. Thus there exists an algorithm to distinguish between: 1. $f: \mathbb{R}^n \rightarrow \{-1,1\}$ which is a linear $k$-junta with surface area $s$, 2. $f$ is $ε$-far from any linear $k$-junta with surface area $(1+ε)s$, where the query complexity of the algorithm is independent of the ambient dimension $n$. Following the surge of interest in noise-tolerant property testing, in this paper we prove a noise-tolerant (or robust) version of this result. Namely, we give an algorithm which given any $c>0$, $ε>0$, distinguishes between 1. $f: \mathbb{R}^n \rightarrow \{-1,1\}$ has correlation at least $c$ with some linear $k$-junta with surface area $s$. 2. $f$ has correlation at most $c-ε$ with any linear $k$-junta with surface area at most $s$. The query complexity of our tester is $k^{\mathsf{poly}(s/ε)}$. Using our techniques, we also obtain a fully noise tolerant tester with the same query complexity for any class $\mathcal{C}$ of linear $k$-juntas with surface area bounded by $s$. As a consequence, we obtain a fully noise tolerant tester with query complexity $k^{O(\mathsf{poly}(\log k/ε))}$ for the class of intersection of $k$-halfspaces (for constant $k$) over the Gaussian space. Our query complexity is independent of the ambient dimension $n$. Previously, no non-trivial noise tolerant testers were known even for a single halfspace.

preprint2020arXiv

An Efficient PTAS for Stochastic Load Balancing with Poisson Jobs

We give the first polynomial-time approximation scheme (PTAS) for the stochastic load balancing problem when the job sizes follow Poisson distributions. This improves upon the 2-approximation algorithm due to Goel and Indyk (FOCS'99). Moreover, our approximation scheme is an efficient PTAS that has a running time double exponential in $1/ε$ but nearly-linear in $n$, where $n$ is the number of jobs and $ε$ is the target error. Previously, a PTAS (not efficient) was only known for jobs that obey exponential distributions (Goel and Indyk, FOCS'99). Our algorithm relies on several probabilistic ingredients including some (seemingly) new results on scaling and the so-called "focusing effect" of maximum of Poisson random variables which might be of independent interest.

preprint2020arXiv

Polynomial-time trace reconstruction in the smoothed complexity model

In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is sent through a probabilistic \emph{deletion channel} which independently deletes each bit with probability $δ$ and concatenates the surviving bits, yielding a \emph{trace} of $x$. The problem is to reconstruct $x$ given independent traces. This problem has received much attention in recent years both in the worst-case setting where $x$ may be an arbitrary string in $\{0,1\}^n$ \cite{DOS17,NazarovPeres17,HHP18,HL18,Chase19} and in the average-case setting where $x$ is drawn uniformly at random from $\{0,1\}^n$ \cite{PeresZhai17,HPP18,HL18,Chase19}. This paper studies trace reconstruction in the \emph{smoothed analysis} setting, in which a ``worst-case'' string $x^{\worst}$ is chosen arbitrarily from $\{0,1\}^n$, and then a perturbed version $\bx$ of $x^{\worst}$ is formed by independently replacing each coordinate by a uniform random bit with probability $σ$. The problem is to reconstruct $\bx$ given independent traces from it. Our main result is an algorithm which, for any constant perturbation rate $0<σ< 1$ and any constant deletion rate $0 < δ< 1$, uses $\poly(n)$ running time and traces and succeeds with high probability in reconstructing the string $\bx$. This stands in contrast with the worst-case version of the problem, for which $\text{exp}(O(n^{1/3}))$ is the best known time and sample complexity \cite{DOS17,NazarovPeres17}. Our approach is based on reconstructing $\bx$ from the multiset of its short subwords and is quite different from previous algorithms for either the worst-case or average-case versions of the problem. The heart of our work is a new $\poly(n)$-time procedure for reconstructing the multiset of all $O(\log n)$-length subwords of any source string $x\in \{0,1\}^n$ given access to traces of $x$.

preprint2020arXiv

Reconstructing weighted voting schemes from partial information about their power indices

A number of recent works [Goldberg 2006; O'Donnell and Servedio 2011; De, Diakonikolas, and Servedio 2017; De, Diakonikolas, Feldman, and Servedio 2014] have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of ``power indices'' that characterize the level of control that individual voters have over the final outcome. In the language of theoretical computer science, this is the problem of approximating an unknown linear threshold function (LTF) over $\{-1, 1\}^n$ given some numerical measure (such as the function's $n$ ``Chow parameters,'' a.k.a. its degree-1 Fourier coefficients, or the vector of its $n$ Shapley indices) of how much each of the $n$ individual input variables affects the outcome of the function. In this paper we consider the problem of reconstructing an LTF given only partial information about its Chow parameters or Shapley indices; i.e. we are given only the Chow parameters or the Shapley indices corresponding to a subset $S \subseteq [n]$ of the $n$ input variables. A natural goal in this partial information setting is to find an LTF whose Chow parameters or Shapley indices corresponding to indices in $S$ accurately match the given Chow parameters or Shapley indices of the unknown LTF. We refer to this as the Partial Inverse Power Index Problem. Our main results are a polynomial time algorithm for the ($\varepsilon$-approximate) Chow Parameters Partial Inverse Power Index Problem and a quasi-polynomial time algorithm for the ($\varepsilon$-approximate) Shapley Indices Partial Inverse Power Index Problem.

preprint2016arXiv

A Size-Free CLT for Poisson Multinomials and its Applications

An $(n,k)$-Poisson Multinomial Distribution (PMD) is the distribution of the sum of $n$ independent random vectors supported on the set ${\cal B}_k=\{e_1,\ldots,e_k\}$ of standard basis vectors in $\mathbb{R}^k$. We show that any $(n,k)$-PMD is ${\rm poly}\left({k\over σ}\right)$-close in total variation distance to the (appropriately discretized) multi-dimensional Gaussian with the same first two moments, removing the dependence on $n$ from the Central Limit Theorem of Valiant and Valiant. Interestingly, our CLT is obtained by bootstrapping the Valiant-Valiant CLT itself through the structural characterization of PMDs shown in recent work by Daskalakis, Kamath, and Tzamos. In turn, our stronger CLT can be leveraged to obtain an efficient PTAS for approximate Nash equilibria in anonymous games, significantly improving the state of the art, and matching qualitatively the running time dependence on $n$ and $1/\varepsilon$ of the best known algorithm for two-strategy anonymous games. Our new CLT also enables the construction of covers for the set of $(n,k)$-PMDs, which are proper and whose size is shown to be essentially optimal. Our cover construction combines our CLT with the Shapley-Folkman theorem and recent sparsification results for Laplacian matrices by Batson, Spielman, and Srivastava. Our cover size lower bound is based on an algebraic geometric construction. Finally, leveraging the structural properties of the Fourier spectrum of PMDs we show that these distributions can be learned from $O_k(1/\varepsilon^2)$ samples in ${\rm poly}_k(1/\varepsilon)$-time, removing the quasi-polynomial dependence of the running time on $1/\varepsilon$ from the algorithm of Daskalakis, Kamath, and Tzamos.

preprint2016arXiv

Noisy population recovery in polynomial time

In the noisy population recovery problem of Dvir et al., the goal is to learn an unknown distribution $f$ on binary strings of length $n$ from noisy samples. For some parameter $μ\in [0,1]$, a noisy sample is generated by flipping each coordinate of a sample from $f$ independently with probability $(1-μ)/2$. We assume an upper bound $k$ on the size of the support of the distribution, and the goal is to estimate the probability of any string to within some given error $\varepsilon$. It is known that the algorithmic complexity and sample complexity of this problem are polynomially related to each other. We show that for $μ> 0$, the sample complexity (and hence the algorithmic complexity) is bounded by a polynomial in $k$, $n$ and $1/\varepsilon$ improving upon the previous best result of $\mathsf{poly}(k^{\log\log k},n,1/\varepsilon)$ due to Lovett and Zhang. Our proof combines ideas from Lovett and Zhang with a \emph{noise attenuated} version of Möbius inversion. In turn, the latter crucially uses the construction of \emph{robust local inverse} due to Moitra and Saks.

preprint2014arXiv

Boolean function monotonicity testing requires (almost) $n^{1/2}$ non-adaptive queries

We prove a lower bound of $Ω(n^{1/2 - c})$, for all $c>0$, on the query complexity of (two-sided error) non-adaptive algorithms for testing whether an $n$-variable Boolean function is monotone versus constant-far from monotone. This improves a $\tildeΩ(n^{1/5})$ lower bound for the same problem that was recently given in [CST14] and is very close to $Ω(n^{1/2})$, which we conjecture is the optimal lower bound for this model.

preprint2013arXiv

A Polynomial-time Approximation Scheme for Fault-tolerant Distributed Storage

We consider a problem which has received considerable attention in systems literature because of its applications to routing in delay tolerant networks and replica placement in distributed storage systems. In abstract terms the problem can be stated as follows: Given a random variable $X$ generated by a known product distribution over $\{0,1\}^n$ and a target value $0 \leq θ\leq 1$, output a non-negative vector $w$, with $\|w\|_1 \le 1$, which maximizes the probability of the event $w \cdot X \ge θ$. This is a challenging non-convex optimization problem for which even computing the value $\Pr[w \cdot X \ge θ]$ of a proposed solution vector $w$ is #P-hard. We provide an additive EPTAS for this problem which, for constant-bounded product distributions, runs in $ \poly(n) \cdot 2^{\poly(1/\eps)}$ time and outputs an $\eps$-approximately optimal solution vector $w$ for this problem. Our approach is inspired by, and extends, recent structural results from the complexity-theoretic study of linear threshold functions. Furthermore, in spite of the objective function being non-smooth, we give a \emph{unicriterion} PTAS while previous work for such objective functions has typically led to a \emph{bicriterion} PTAS. We believe our techniques may be applicable to get unicriterion PTAS for other non-smooth objective functions.

preprint2013arXiv

A robust Khintchine inequality, and algorithms for computing optimal constants in Fourier analysis and high-dimensional geometry

This paper makes two contributions towards determining some well-studied optimal constants in Fourier analysis \newa{of Boolean functions} and high-dimensional geometry. \begin{enumerate} \item It has been known since 1994 \cite{GL:94} that every linear threshold function has squared Fourier mass at least 1/2 on its degree-0 and degree-1 coefficients. Denote the minimum such Fourier mass by $\w^{\leq 1}[\ltf]$, where the minimum is taken over all $n$-variable linear threshold functions and all $n \ge 0$. Benjamini, Kalai and Schramm \cite{BKS:99} have conjectured that the true value of $\w^{\leq 1}[\ltf]$ is $2/π$. We make progress on this conjecture by proving that $\w^{\leq 1}[\ltf] \geq 1/2 + c$ for some absolute constant $c>0$. The key ingredient in our proof is a "robust" version of the well-known Khintchine inequality in functional analysis, which we believe may be of independent interest. \item We give an algorithm with the following property: given any $η> 0$, the algorithm runs in time $2^{\poly(1/η)}$ and determines the value of $\w^{\leq 1}[\ltf]$ up to an additive error of $\pmη$. We give a similar $2^{\poly(1/η)}$-time algorithm to determine \emph{Tomaszewski's constant} to within an additive error of $\pm η$; this is the minimum (over all origin-centered hyperplanes $H$) fraction of points in $\{-1,1\}^n$ that lie within Euclidean distance 1 of $H$. Tomaszewski's constant is conjectured to be 1/2; lower bounds on it have been given by Holzman and Kleitman \cite{HK92} and independently by Ben-Tal, Nemirovski and Roos \cite{BNR02}. Our algorithms combine tools from anti-concentration of sums of independent random variables, Fourier analysis, and Hermite analysis of linear threshold functions. \end{enumerate}

preprint2013arXiv

Deterministic Approximate Counting for Degree-$2$ Polynomial Threshold Functions

We give a {\em deterministic} algorithm for approximately computing the fraction of Boolean assignments that satisfy a degree-$2$ polynomial threshold function. Given a degree-2 input polynomial $p(x_1,\dots,x_n)$ and a parameter $\eps > 0$, the algorithm approximates \[ \Pr_{x \sim \{-1,1\}^n}[p(x) \geq 0] \] to within an additive $\pm \eps$ in time $\poly(n,2^{\poly(1/\eps)})$. Note that it is NP-hard to determine whether the above probability is nonzero, so any sort of multiplicative approximation is almost certainly impossible even for efficient randomized algorithms. This is the first deterministic algorithm for this counting problem in which the running time is polynomial in $n$ for $\eps= o(1)$. For "regular" polynomials $p$ (those in which no individual variable's influence is large compared to the sum of all $n$ variable influences) our algorithm runs in $\poly(n,1/\eps)$ time. The algorithm also runs in $\poly(n,1/\eps)$ time to approximate $\Pr_{x \sim N(0,1)^n}[p(x) \geq 0]$ to within an additive $\pm \eps$, for any degree-2 polynomial $p$. As an application of our counting result, we give a deterministic FPT multiplicative $(1 \pm \eps)$-approximation algorithm to approximate the $k$-th absolute moment $\E_{x \sim \{-1,1\}^n}[|p(x)^k|]$ of a degree-2 polynomial. The algorithm's running time is of the form $\poly(n) \cdot f(k,1/\eps)$.

preprint2013arXiv

Deterministic Approximate Counting for Juntas of Degree-$2$ Polynomial Threshold Functions

Let $g: \{-1,1\}^k \to \{-1,1\}$ be any Boolean function and $q_1,\dots,q_k$ be any degree-2 polynomials over $\{-1,1\}^n.$ We give a \emph{deterministic} algorithm which, given as input explicit descriptions of $g,q_1,\dots,q_k$ and an accuracy parameter $\eps>0$, approximates \[\Pr_{x \sim \{-1,1\}^n}[g(\sign(q_1(x)),\dots,\sign(q_k(x)))=1]\] to within an additive $\pm \eps$. For any constant $\eps > 0$ and $k \geq 1$ the running time of our algorithm is a fixed polynomial in $n$. This is the first fixed polynomial-time algorithm that can deterministically approximately count satisfying assignments of a natural class of depth-3 Boolean circuits. Our algorithm extends a recent result \cite{DDS13:deg2count} which gave a deterministic approximate counting algorithm for a single degree-2 polynomial threshold function $\sign(q(x)),$ corresponding to the $k=1$ case of our result. Our algorithm and analysis requires several novel technical ingredients that go significantly beyond the tools required to handle the $k=1$ case in \cite{DDS13:deg2count}. One of these is a new multidimensional central limit theorem for degree-2 polynomials in Gaussian random variables which builds on recent Malliavin-calculus-based results from probability theory. We use this CLT as the basis of a new decomposition technique for $k$-tuples of degree-2 Gaussian polynomials and thus obtain an efficient deterministic approximate counting algorithm for the Gaussian distribution. Finally, a third new ingredient is a "regularity lemma" for \emph{$k$-tuples} of degree-$d$ polynomial threshold functions. This generalizes both the regularity lemmas of \cite{DSTW:10,HKM:09} and the regularity lemma of Gopalan et al \cite{GOWZ10}. Our new regularity lemma lets us extend our deterministic approximate counting results from the Gaussian to the Boolean domain.

preprint2013arXiv

Efficient deterministic approximate counting for low-degree polynomial threshold functions

We give a deterministic algorithm for approximately counting satisfying assignments of a degree-$d$ polynomial threshold function (PTF). Given a degree-$d$ input polynomial $p(x_1,\dots,x_n)$ over $R^n$ and a parameter $ε> 0$, our algorithm approximates $\Pr_{x \sim \{-1,1\}^n}[p(x) \geq 0]$ to within an additive $\pm ε$ in time $O_{d,ε}(1)\cdot \mathop{poly}(n^d)$. (Any sort of efficient multiplicative approximation is impossible even for randomized algorithms assuming $NP\not=RP$.) Note that the running time of our algorithm (as a function of $n^d$, the number of coefficients of a degree-$d$ PTF) is a \emph{fixed} polynomial. The fastest previous algorithm for this problem (due to Kane), based on constructions of unconditional pseudorandom generators for degree-$d$ PTFs, runs in time $n^{O_{d,c}(1) \cdot ε^{-c}}$ for all $c > 0$. The key novel contributions of this work are: A new multivariate central limit theorem, proved using tools from Malliavin calculus and Stein's Method. This new CLT shows that any collection of Gaussian polynomials with small eigenvalues must have a joint distribution which is very close to a multidimensional Gaussian distribution. A new decomposition of low-degree multilinear polynomials over Gaussian inputs. Roughly speaking we show that (up to some small error) any such polynomial can be decomposed into a bounded number of multilinear polynomials all of which have extremely small eigenvalues. We use these new ingredients to give a deterministic algorithm for a Gaussian-space version of the approximate counting problem, and then employ standard techniques for working with low-degree PTFs (invariance principles and regularity lemmas) to reduce the original approximate counting problem over the Boolean hypercube to the Gaussian version.

preprint2013arXiv

Explicit Optimal Hardness via Gaussian stability results

The results of Raghavendra (2008) show that assuming Khot's Unique Games Conjecture (2002), for every constraint satisfaction problem there exists a generic semi-definite program that achieves the optimal approximation factor. This result is existential as it does not provide an explicit optimal rounding procedure nor does it allow to calculate exactly the Unique Games hardness of the problem. Obtaining an explicit optimal approximation scheme and the corresponding approximation factor is a difficult challenge for each specific approximation problem. An approach for determining the exact approximation factor and the corresponding optimal rounding was established in the analysis of MAX-CUT (KKMO 2004) and the use of the Invariance Principle (MOO 2005). However, this approach crucially relies on results explicitly proving optimal partitions in Gaussian space. Until recently, Borell's result (Borell 1985) was the only non-trivial Gaussian partition result known. In this paper we derive the first explicit optimal approximation algorithm and the corresponding approximation factor using a new result on Gaussian partitions due to Isaksson and Mossel (2012). This Gaussian result allows us to determine exactly the Unique Games Hardness of MAX-3-EQUAL. In particular, our results show that Zwick algorithm for this problem achieves the optimal approximation factor and prove that the approximation achieved by the algorithm is $\approx 0.796$ as conjectured by Zwick. We further use the previously known optimal Gaussian partitions results to obtain a new Unique Games Hardness factor for MAX-k-CSP : Using the well known fact that jointly normal pairwise independent random variables are fully independent, we show that the the UGC hardness of Max-k-CSP is $\frac{\lceil (k+1)/2 \rceil}{2^{k-1}}$, improving on results of Austrin and Mossel (2009).

preprint2012arXiv

Inverse problems in approximate uniform generation

We initiate the study of \emph{inverse} problems in approximate uniform generation, focusing on uniform generation of satisfying assignments of various types of Boolean functions. In such an inverse problem, the algorithm is given uniform random satisfying assignments of an unknown function $f$ belonging to a class $\C$ of Boolean functions, and the goal is to output a probability distribution $D$ which is $ε$-close, in total variation distance, to the uniform distribution over $f^{-1}(1)$. Positive results: We prove a general positive result establishing sufficient conditions for efficient inverse approximate uniform generation for a class $\C$. We define a new type of algorithm called a \emph{densifier} for $\C$, and show (roughly speaking) how to combine (i) a densifier, (ii) an approximate counting / uniform generation algorithm, and (iii) a Statistical Query learning algorithm, to obtain an inverse approximate uniform generation algorithm. We apply this general result to obtain a poly$(n,1/\eps)$-time algorithm for the class of halfspaces; and a quasipoly$(n,1/\eps)$-time algorithm for the class of $\poly(n)$-size DNF formulas. Negative results: We prove a general negative result establishing that the existence of certain types of signature schemes in cryptography implies the hardness of certain inverse approximate uniform generation problems. This implies that there are no {subexponential}-time inverse approximate uniform generation algorithms for 3-CNF formulas; for intersections of two halfspaces; for degree-2 polynomial threshold functions; and for monotone 2-CNF formulas. Finally, we show that there is no general relationship between the complexity of the "forward" approximate uniform generation problem and the complexity of the inverse problem for a class $\C$ -- it is possible for either one to be easy while the other is hard.

preprint2012arXiv

Majority is Stablest : Discrete and SoS

The Majority is Stablest Theorem has numerous applications in hardness of approximation and social choice theory. We give a new proof of the Majority is Stablest Theorem by induction on the dimension of the discrete cube. Unlike the previous proof, it uses neither the "invariance principle" nor Borell's result in Gaussian space. The new proof is general enough to include all previous variants of majority is stablest such as "it ain't over until it's over" and "Majority is most predictable". Moreover, the new proof allows us to derive a proof of Majority is Stablest in a constant level of the Sum of Squares hierarchy.This implies in particular that Khot-Vishnoi instance of Max-Cut does not provide a gap instance for the Lasserre hierarchy.

preprint2012arXiv

Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces

The \emph{Chow parameters} of a Boolean function $f: \{-1,1\}^n \to \{-1,1\}$ are its $n+1$ degree-0 and degree-1 Fourier coefficients. It has been known since 1961 (Chow, Tannenbaum) that the (exact values of the) Chow parameters of any linear threshold function $f$ uniquely specify $f$ within the space of all Boolean functions, but until recently (O'Donnell and Servedio) nothing was known about efficient algorithms for \emph{reconstructing} $f$ (exactly or approximately) from exact or approximate values of its Chow parameters. We refer to this reconstruction problem as the \emph{Chow Parameters Problem.} Our main result is a new algorithm for the Chow Parameters Problem which, given (sufficiently accurate approximations to) the Chow parameters of any linear threshold function $f$, runs in time $\tilde{O}(n^2)\cdot (1/\eps)^{O(\log^2(1/\eps))}$ and with high probability outputs a representation of an LTF $f'$ that is $\eps$-close to $f$. The only previous algorithm (O'Donnell and Servedio) had running time $\poly(n) \cdot 2^{2^{\tilde{O}(1/\eps^2)}}.$ As a byproduct of our approach, we show that for any linear threshold function $f$ over $\{-1,1\}^n$, there is a linear threshold function $f'$ which is $\eps$-close to $f$ and has all weights that are integers at most $\sqrt{n} \cdot (1/\eps)^{O(\log^2(1/\eps))}$. This significantly improves the best previous result of Diakonikolas and Servedio which gave a $\poly(n) \cdot 2^{\tilde{O}(1/\eps^{2/3})}$ weight bound, and is close to the known lower bound of $\max\{\sqrt{n},$ $(1/\eps)^{Ω(\log \log (1/\eps))}\}$ (Goldberg, Servedio). Our techniques also yield improved algorithms for related problems in learning theory.

preprint2012arXiv

The Inverse Shapley Value Problem

For $f$ a weighted voting scheme used by $n$ voters to choose between two candidates, the $n$ \emph{Shapley-Shubik Indices} (or {\em Shapley values}) of $f$ provide a measure of how much control each voter can exert over the overall outcome of the vote. Shapley-Shubik indices were introduced by Lloyd Shapley and Martin Shubik in 1954 \cite{SS54} and are widely studied in social choice theory as a measure of the "influence" of voters. The \emph{Inverse Shapley Value Problem} is the problem of designing a weighted voting scheme which (approximately) achieves a desired input vector of values for the Shapley-Shubik indices. Despite much interest in this problem no provably correct and efficient algorithm was known prior to our work. We give the first efficient algorithm with provable performance guarantees for the Inverse Shapley Value Problem. For any constant $\eps > 0$ our algorithm runs in fixed poly$(n)$ time (the degree of the polynomial is independent of $\eps$) and has the following performance guarantee: given as input a vector of desired Shapley values, if any "reasonable" weighted voting scheme (roughly, one in which the threshold is not too skewed) approximately matches the desired vector of values to within some small error, then our algorithm explicitly outputs a weighted voting scheme that achieves this vector of Shapley values to within error $\eps.$ If there is a "reasonable" voting scheme in which all voting weights are integers at most $\poly(n)$ that approximately achieves the desired Shapley values, then our algorithm runs in time $\poly(n)$ and outputs a weighted voting scheme that achieves the target vector of Shapley values to within error $\eps=n^{-1/8}.$

preprint2012arXiv

Trevisan's extractor in the presence of quantum side information

Randomness extraction involves the processing of purely classical information and is therefore usually studied in the framework of classical probability theory. However, such a classical treatment is generally too restrictive for applications, where side information about the values taken by classical random variables may be represented by the state of a quantum system. This is particularly relevant in the context of cryptography, where an adversary may make use of quantum devices. Here, we show that the well known construction paradigm for extractors proposed by Trevisan is sound in the presence of quantum side information. We exploit the modularity of this paradigm to give several concrete extractor constructions, which, e.g, extract all the conditional (smooth) min-entropy of the source using a seed of length poly-logarithmic in the input, or only require the seed to be weakly random.

preprint2011arXiv

Lower bounds in differential privacy

This is a paper about private data analysis, in which a trusted curator holding a confidential database responds to real vector-valued queries. A common approach to ensuring privacy for the database elements is to add appropriately generated random noise to the answers, releasing only these {\em noisy} responses. In this paper, we investigate various lower bounds on the noise required to maintain different kind of privacy guarantees.

preprint2010arXiv

Near-optimal extractors against quantum storage

We show that Trevisan's extractor and its variants \cite{T99,RRV99} are secure against bounded quantum storage adversaries. One instantiation gives the first such extractor to achieve an output length $Θ(K-b)$, where $K$ is the source's entropy and $b$ the adversary's storage, together with a poly-logarithmic seed length. Another instantiation achieves a logarithmic key length, with a slightly smaller output length $Θ((K-b)/K^γ)$ for any $γ>0$. In contrast, the previous best construction \cite{TS09} could only extract $(K/b)^{1/15}$ bits. Some of our constructions have the additional advantage that every bit of the output is a function of only a polylogarithmic number of bits from the source, which is crucial for some cryptographic applications. Our argument is based on bounds for a generalization of quantum random access codes, which we call \emph{quantum functional access codes}. This is crucial as it lets us avoid the local list-decoding algorithm central to the approach in \cite{TS09}, which was the source of the multiplicative overhead.

Anindya De

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

Reconstructing Ultrametric Trees from Noisy Experiments

Learning a mixture of two subspaces over finite fields

Robust testing of low-dimensional functions

An Efficient PTAS for Stochastic Load Balancing with Poisson Jobs

Polynomial-time trace reconstruction in the smoothed complexity model

Reconstructing weighted voting schemes from partial information about their power indices

A Size-Free CLT for Poisson Multinomials and its Applications

Noisy population recovery in polynomial time

Boolean function monotonicity testing requires (almost) $n^{1/2}$ non-adaptive queries

A Polynomial-time Approximation Scheme for Fault-tolerant Distributed Storage

A robust Khintchine inequality, and algorithms for computing optimal constants in Fourier analysis and high-dimensional geometry

Deterministic Approximate Counting for Degree-$2$ Polynomial Threshold Functions

Deterministic Approximate Counting for Juntas of Degree-$2$ Polynomial Threshold Functions

Efficient deterministic approximate counting for low-degree polynomial threshold functions

Explicit Optimal Hardness via Gaussian stability results

Inverse problems in approximate uniform generation

Majority is Stablest : Discrete and SoS

Nearly optimal solutions for the Chow Parameters Problem and low-weight approximation of halfspaces

The Inverse Shapley Value Problem

Trevisan's extractor in the presence of quantum side information

Lower bounds in differential privacy

Near-optimal extractors against quantum storage